Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

+2

Documentation/virt/index.rst

··· 8 8 :maxdepth: 2 9 9 10 10 kvm/index 11 + uml/user_mode_linux 11 12 paravirt_ops 13 + guest-halt-polling 12 14 13 15 .. only:: html and subproject 14 16

+1967 -1391

Documentation/virt/kvm/api.txt Documentation/virt/kvm/api.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =================================================================== 1 4 The Definitive KVM (Kernel-based Virtual Machine) API Documentation 2 5 =================================================================== 3 6 4 7 1. General description 5 - ---------------------- 8 + ====================== 6 9 7 10 The kvm API is a set of ioctls that are issued to control various aspects 8 11 of a virtual machine. The ioctls belong to the following classes: ··· 36 33 was used to create the VM. 37 34 38 35 2. File descriptors 39 - ------------------- 36 + =================== 40 37 41 38 The kvm API is centered around file descriptors. An initial 42 39 open("/dev/kvm") obtains a handle to the kvm subsystem; this handle ··· 73 70 74 71 75 72 3. Extensions 76 - ------------- 73 + ============= 77 74 78 75 As of Linux 2.6.22, the KVM ABI has been stabilized: no backward 79 76 incompatible change are allowed. However, there is an extension ··· 87 84 88 85 89 86 4. API description 90 - ------------------ 87 + ================== 91 88 92 89 This section describes ioctls that can be used to control kvm guests. 93 90 For each ioctl, the following information is provided along with a 94 91 description: 95 92 96 - Capability: which KVM extension provides this ioctl. Can be 'basic', 93 + Capability: 94 + which KVM extension provides this ioctl. Can be 'basic', 97 95 which means that is will be provided by any kernel that supports 98 96 API version 12 (see section 4.1), a KVM_CAP_xyz constant, which 99 97 means availability needs to be checked with KVM_CHECK_EXTENSION ··· 103 99 availability: for kernels that don't support the ioctl, 104 100 the ioctl returns -ENOTTY. 105 101 106 - Architectures: which instruction set architectures provide this ioctl. 102 + Architectures: 103 + which instruction set architectures provide this ioctl. 107 104 x86 includes both i386 and x86_64. 108 105 109 - Type: system, vm, or vcpu. 106 + Type: 107 + system, vm, or vcpu. 110 108 111 - Parameters: what parameters are accepted by the ioctl. 109 + Parameters: 110 + what parameters are accepted by the ioctl. 112 111 113 - Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL) 112 + Returns: 113 + the return value. General error numbers (EBADF, ENOMEM, EINVAL) 114 114 are not detailed, but errors with specific meanings are. 115 115 116 116 117 117 4.1 KVM_GET_API_VERSION 118 + ----------------------- 118 119 119 - Capability: basic 120 - Architectures: all 121 - Type: system ioctl 122 - Parameters: none 123 - Returns: the constant KVM_API_VERSION (=12) 120 + :Capability: basic 121 + :Architectures: all 122 + :Type: system ioctl 123 + :Parameters: none 124 + :Returns: the constant KVM_API_VERSION (=12) 124 125 125 126 This identifies the API version as the stable kvm API. It is not 126 127 expected that this number will change. However, Linux 2.6.20 and ··· 136 127 137 128 138 129 4.2 KVM_CREATE_VM 130 + ----------------- 139 131 140 - Capability: basic 141 - Architectures: all 142 - Type: system ioctl 143 - Parameters: machine type identifier (KVM_VM_*) 144 - Returns: a VM fd that can be used to control the new virtual machine. 132 + :Capability: basic 133 + :Architectures: all 134 + :Type: system ioctl 135 + :Parameters: machine type identifier (KVM_VM_*) 136 + :Returns: a VM fd that can be used to control the new virtual machine. 145 137 146 138 The new VM has no virtual cpus and no memory. 147 139 You probably want to use 0 as machine type. ··· 165 155 address used by the VM. The IPA_Bits is encoded in bits[7-0] of the 166 156 machine type identifier. 167 157 168 - e.g, to configure a guest to use 48bit physical address size : 158 + e.g, to configure a guest to use 48bit physical address size:: 169 159 170 160 vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48)); 171 161 172 - The requested size (IPA_Bits) must be : 173 - 0 - Implies default size, 40bits (for backward compatibility) 162 + The requested size (IPA_Bits) must be: 174 163 175 - or 176 - 177 - N - Implies N bits, where N is a positive integer such that, 164 + == ========================================================= 165 + 0 Implies default size, 40bits (for backward compatibility) 166 + N Implies N bits, where N is a positive integer such that, 178 167 32 <= N <= Host_IPA_Limit 168 + == ========================================================= 179 169 180 170 Host_IPA_Limit is the maximum possible value for IPA_Bits on the host and 181 171 is dependent on the CPU capability and the kernel configuration. The limit can ··· 189 179 190 180 191 181 4.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST 182 + ---------------------------------------------------------- 192 183 193 - Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST 194 - Architectures: x86 195 - Type: system ioctl 196 - Parameters: struct kvm_msr_list (in/out) 197 - Returns: 0 on success; -1 on error 184 + :Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST 185 + :Architectures: x86 186 + :Type: system ioctl 187 + :Parameters: struct kvm_msr_list (in/out) 188 + :Returns: 0 on success; -1 on error 189 + 198 190 Errors: 199 - EFAULT: the msr index list cannot be read from or written to 200 - E2BIG: the msr index list is to be to fit in the array specified by 201 - the user. 202 191 203 - struct kvm_msr_list { 192 + ====== ============================================================ 193 + EFAULT the msr index list cannot be read from or written to 194 + E2BIG the msr index list is to be to fit in the array specified by 195 + the user. 196 + ====== ============================================================ 197 + 198 + :: 199 + 200 + struct kvm_msr_list { 204 201 __u32 nmsrs; /* number of msrs in entries */ 205 202 __u32 indices[0]; 206 - }; 203 + }; 207 204 208 205 The user fills in the size of the indices array in nmsrs, and in return 209 206 kvm adjusts nmsrs to reflect the actual number of msrs and fills in the ··· 231 214 232 215 233 216 4.4 KVM_CHECK_EXTENSION 217 + ----------------------- 234 218 235 - Capability: basic, KVM_CAP_CHECK_EXTENSION_VM for vm ioctl 236 - Architectures: all 237 - Type: system ioctl, vm ioctl 238 - Parameters: extension identifier (KVM_CAP_*) 239 - Returns: 0 if unsupported; 1 (or some other positive integer) if supported 219 + :Capability: basic, KVM_CAP_CHECK_EXTENSION_VM for vm ioctl 220 + :Architectures: all 221 + :Type: system ioctl, vm ioctl 222 + :Parameters: extension identifier (KVM_CAP_*) 223 + :Returns: 0 if unsupported; 1 (or some other positive integer) if supported 240 224 241 225 The API allows the application to query about extensions to the core 242 226 kvm API. Userspace passes an extension identifier (an integer) and ··· 250 232 with KVM_CAP_CHECK_EXTENSION_VM on the vm fd) 251 233 252 234 4.5 KVM_GET_VCPU_MMAP_SIZE 235 + -------------------------- 253 236 254 - Capability: basic 255 - Architectures: all 256 - Type: system ioctl 257 - Parameters: none 258 - Returns: size of vcpu mmap area, in bytes 237 + :Capability: basic 238 + :Architectures: all 239 + :Type: system ioctl 240 + :Parameters: none 241 + :Returns: size of vcpu mmap area, in bytes 259 242 260 243 The KVM_RUN ioctl (cf.) communicates with userspace via a shared 261 244 memory region. This ioctl returns the size of that region. See the ··· 264 245 265 246 266 247 4.6 KVM_SET_MEMORY_REGION 248 + ------------------------- 267 249 268 - Capability: basic 269 - Architectures: all 270 - Type: vm ioctl 271 - Parameters: struct kvm_memory_region (in) 272 - Returns: 0 on success, -1 on error 250 + :Capability: basic 251 + :Architectures: all 252 + :Type: vm ioctl 253 + :Parameters: struct kvm_memory_region (in) 254 + :Returns: 0 on success, -1 on error 273 255 274 256 This ioctl is obsolete and has been removed. 275 257 276 258 277 259 4.7 KVM_CREATE_VCPU 260 + ------------------- 278 261 279 - Capability: basic 280 - Architectures: all 281 - Type: vm ioctl 282 - Parameters: vcpu id (apic id on x86) 283 - Returns: vcpu fd on success, -1 on error 262 + :Capability: basic 263 + :Architectures: all 264 + :Type: vm ioctl 265 + :Parameters: vcpu id (apic id on x86) 266 + :Returns: vcpu fd on success, -1 on error 284 267 285 268 This API adds a vcpu to a virtual machine. No more than max_vcpus may be added. 286 269 The vcpu id is an integer in the range [0, max_vcpu_id). ··· 323 302 324 303 325 304 4.8 KVM_GET_DIRTY_LOG (vm ioctl) 305 + -------------------------------- 326 306 327 - Capability: basic 328 - Architectures: all 329 - Type: vm ioctl 330 - Parameters: struct kvm_dirty_log (in/out) 331 - Returns: 0 on success, -1 on error 307 + :Capability: basic 308 + :Architectures: all 309 + :Type: vm ioctl 310 + :Parameters: struct kvm_dirty_log (in/out) 311 + :Returns: 0 on success, -1 on error 332 312 333 - /* for KVM_GET_DIRTY_LOG */ 334 - struct kvm_dirty_log { 313 + :: 314 + 315 + /* for KVM_GET_DIRTY_LOG */ 316 + struct kvm_dirty_log { 335 317 __u32 slot; 336 318 __u32 padding; 337 319 union { 338 320 void __user *dirty_bitmap; /* one bit per page */ 339 321 __u64 padding; 340 322 }; 341 - }; 323 + }; 342 324 343 325 Given a memory slot, return a bitmap containing any pages dirtied 344 326 since the last call to this ioctl. Bit 0 is the first page in the ··· 358 334 see the description of the capability. 359 335 360 336 4.9 KVM_SET_MEMORY_ALIAS 337 + ------------------------ 361 338 362 - Capability: basic 363 - Architectures: x86 364 - Type: vm ioctl 365 - Parameters: struct kvm_memory_alias (in) 366 - Returns: 0 (success), -1 (error) 339 + :Capability: basic 340 + :Architectures: x86 341 + :Type: vm ioctl 342 + :Parameters: struct kvm_memory_alias (in) 343 + :Returns: 0 (success), -1 (error) 367 344 368 345 This ioctl is obsolete and has been removed. 369 346 370 347 371 348 4.10 KVM_RUN 349 + ------------ 372 350 373 - Capability: basic 374 - Architectures: all 375 - Type: vcpu ioctl 376 - Parameters: none 377 - Returns: 0 on success, -1 on error 351 + :Capability: basic 352 + :Architectures: all 353 + :Type: vcpu ioctl 354 + :Parameters: none 355 + :Returns: 0 on success, -1 on error 356 + 378 357 Errors: 379 - EINTR: an unmasked signal is pending 358 + 359 + ===== ============================= 360 + EINTR an unmasked signal is pending 361 + ===== ============================= 380 362 381 363 This ioctl is used to run a guest virtual cpu. While there are no 382 364 explicit parameters, there is an implicit parameter block that can be ··· 392 362 393 363 394 364 4.11 KVM_GET_REGS 365 + ----------------- 395 366 396 - Capability: basic 397 - Architectures: all except ARM, arm64 398 - Type: vcpu ioctl 399 - Parameters: struct kvm_regs (out) 400 - Returns: 0 on success, -1 on error 367 + :Capability: basic 368 + :Architectures: all except ARM, arm64 369 + :Type: vcpu ioctl 370 + :Parameters: struct kvm_regs (out) 371 + :Returns: 0 on success, -1 on error 401 372 402 373 Reads the general purpose registers from the vcpu. 403 374 404 - /* x86 */ 405 - struct kvm_regs { 375 + :: 376 + 377 + /* x86 */ 378 + struct kvm_regs { 406 379 /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ 407 380 __u64 rax, rbx, rcx, rdx; 408 381 __u64 rsi, rdi, rsp, rbp; 409 382 __u64 r8, r9, r10, r11; 410 383 __u64 r12, r13, r14, r15; 411 384 __u64 rip, rflags; 412 - }; 385 + }; 413 386 414 - /* mips */ 415 - struct kvm_regs { 387 + /* mips */ 388 + struct kvm_regs { 416 389 /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ 417 390 __u64 gpr[32]; 418 391 __u64 hi; 419 392 __u64 lo; 420 393 __u64 pc; 421 - }; 394 + }; 422 395 423 396 424 397 4.12 KVM_SET_REGS 398 + ----------------- 425 399 426 - Capability: basic 427 - Architectures: all except ARM, arm64 428 - Type: vcpu ioctl 429 - Parameters: struct kvm_regs (in) 430 - Returns: 0 on success, -1 on error 400 + :Capability: basic 401 + :Architectures: all except ARM, arm64 402 + :Type: vcpu ioctl 403 + :Parameters: struct kvm_regs (in) 404 + :Returns: 0 on success, -1 on error 431 405 432 406 Writes the general purpose registers into the vcpu. 433 407 ··· 439 405 440 406 441 407 4.13 KVM_GET_SREGS 408 + ------------------ 442 409 443 - Capability: basic 444 - Architectures: x86, ppc 445 - Type: vcpu ioctl 446 - Parameters: struct kvm_sregs (out) 447 - Returns: 0 on success, -1 on error 410 + :Capability: basic 411 + :Architectures: x86, ppc 412 + :Type: vcpu ioctl 413 + :Parameters: struct kvm_sregs (out) 414 + :Returns: 0 on success, -1 on error 448 415 449 416 Reads special registers from the vcpu. 450 417 451 - /* x86 */ 452 - struct kvm_sregs { 418 + :: 419 + 420 + /* x86 */ 421 + struct kvm_sregs { 453 422 struct kvm_segment cs, ds, es, fs, gs, ss; 454 423 struct kvm_segment tr, ldt; 455 424 struct kvm_dtable gdt, idt; ··· 460 423 __u64 efer; 461 424 __u64 apic_base; 462 425 __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64]; 463 - }; 426 + }; 464 427 465 - /* ppc -- see arch/powerpc/include/uapi/asm/kvm.h */ 428 + /* ppc -- see arch/powerpc/include/uapi/asm/kvm.h */ 466 429 467 430 interrupt_bitmap is a bitmap of pending external interrupts. At most 468 431 one bit may be set. This interrupt has been acknowledged by the APIC ··· 470 433 471 434 472 435 4.14 KVM_SET_SREGS 436 + ------------------ 473 437 474 - Capability: basic 475 - Architectures: x86, ppc 476 - Type: vcpu ioctl 477 - Parameters: struct kvm_sregs (in) 478 - Returns: 0 on success, -1 on error 438 + :Capability: basic 439 + :Architectures: x86, ppc 440 + :Type: vcpu ioctl 441 + :Parameters: struct kvm_sregs (in) 442 + :Returns: 0 on success, -1 on error 479 443 480 444 Writes special registers into the vcpu. See KVM_GET_SREGS for the 481 445 data structures. 482 446 483 447 484 448 4.15 KVM_TRANSLATE 449 + ------------------ 485 450 486 - Capability: basic 487 - Architectures: x86 488 - Type: vcpu ioctl 489 - Parameters: struct kvm_translation (in/out) 490 - Returns: 0 on success, -1 on error 451 + :Capability: basic 452 + :Architectures: x86 453 + :Type: vcpu ioctl 454 + :Parameters: struct kvm_translation (in/out) 455 + :Returns: 0 on success, -1 on error 491 456 492 457 Translates a virtual address according to the vcpu's current address 493 458 translation mode. 494 459 495 - struct kvm_translation { 460 + :: 461 + 462 + struct kvm_translation { 496 463 /* in */ 497 464 __u64 linear_address; 498 465 ··· 506 465 __u8 writeable; 507 466 __u8 usermode; 508 467 __u8 pad[5]; 509 - }; 468 + }; 510 469 511 470 512 471 4.16 KVM_INTERRUPT 472 + ------------------ 513 473 514 - Capability: basic 515 - Architectures: x86, ppc, mips 516 - Type: vcpu ioctl 517 - Parameters: struct kvm_interrupt (in) 518 - Returns: 0 on success, negative on failure. 474 + :Capability: basic 475 + :Architectures: x86, ppc, mips 476 + :Type: vcpu ioctl 477 + :Parameters: struct kvm_interrupt (in) 478 + :Returns: 0 on success, negative on failure. 519 479 520 480 Queues a hardware interrupt vector to be injected. 521 481 522 - /* for KVM_INTERRUPT */ 523 - struct kvm_interrupt { 482 + :: 483 + 484 + /* for KVM_INTERRUPT */ 485 + struct kvm_interrupt { 524 486 /* in */ 525 487 __u32 irq; 526 - }; 488 + }; 527 489 528 490 X86: 491 + ^^^^ 529 492 530 - Returns: 0 on success, 531 - -EEXIST if an interrupt is already enqueued 532 - -EINVAL the the irq number is invalid 533 - -ENXIO if the PIC is in the kernel 534 - -EFAULT if the pointer is invalid 493 + :Returns: 494 + 495 + ========= =================================== 496 + 0 on success, 497 + -EEXIST if an interrupt is already enqueued 498 + -EINVAL the the irq number is invalid 499 + -ENXIO if the PIC is in the kernel 500 + -EFAULT if the pointer is invalid 501 + ========= =================================== 535 502 536 503 Note 'irq' is an interrupt vector, not an interrupt pin or line. This 537 504 ioctl is useful if the in-kernel PIC is not used. 538 505 539 506 PPC: 507 + ^^^^ 540 508 541 509 Queues an external interrupt to be injected. This ioctl is overleaded 542 510 with 3 different irq values: 543 511 544 512 a) KVM_INTERRUPT_SET 545 513 546 - This injects an edge type external interrupt into the guest once it's ready 547 - to receive interrupts. When injected, the interrupt is done. 514 + This injects an edge type external interrupt into the guest once it's ready 515 + to receive interrupts. When injected, the interrupt is done. 548 516 549 517 b) KVM_INTERRUPT_UNSET 550 518 551 - This unsets any pending interrupt. 519 + This unsets any pending interrupt. 552 520 553 - Only available with KVM_CAP_PPC_UNSET_IRQ. 521 + Only available with KVM_CAP_PPC_UNSET_IRQ. 554 522 555 523 c) KVM_INTERRUPT_SET_LEVEL 556 524 557 - This injects a level type external interrupt into the guest context. The 558 - interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET 559 - is triggered. 525 + This injects a level type external interrupt into the guest context. The 526 + interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET 527 + is triggered. 560 528 561 - Only available with KVM_CAP_PPC_IRQ_LEVEL. 529 + Only available with KVM_CAP_PPC_IRQ_LEVEL. 562 530 563 531 Note that any value for 'irq' other than the ones stated above is invalid 564 532 and incurs unexpected behavior. ··· 575 525 This is an asynchronous vcpu ioctl and can be invoked from any thread. 576 526 577 527 MIPS: 528 + ^^^^^ 578 529 579 530 Queues an external interrupt to be injected into the virtual CPU. A negative 580 531 interrupt number dequeues the interrupt. ··· 584 533 585 534 586 535 4.17 KVM_DEBUG_GUEST 536 + -------------------- 587 537 588 - Capability: basic 589 - Architectures: none 590 - Type: vcpu ioctl 591 - Parameters: none) 592 - Returns: -1 on error 538 + :Capability: basic 539 + :Architectures: none 540 + :Type: vcpu ioctl 541 + :Parameters: none) 542 + :Returns: -1 on error 593 543 594 544 Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead. 595 545 596 546 597 547 4.18 KVM_GET_MSRS 548 + ----------------- 598 549 599 - Capability: basic (vcpu), KVM_CAP_GET_MSR_FEATURES (system) 600 - Architectures: x86 601 - Type: system ioctl, vcpu ioctl 602 - Parameters: struct kvm_msrs (in/out) 603 - Returns: number of msrs successfully returned; 604 - -1 on error 550 + :Capability: basic (vcpu), KVM_CAP_GET_MSR_FEATURES (system) 551 + :Architectures: x86 552 + :Type: system ioctl, vcpu ioctl 553 + :Parameters: struct kvm_msrs (in/out) 554 + :Returns: number of msrs successfully returned; 555 + -1 on error 605 556 606 557 When used as a system ioctl: 607 558 Reads the values of MSR-based features that are available for the VM. This ··· 615 562 Reads model-specific registers from the vcpu. Supported msr indices can 616 563 be obtained using KVM_GET_MSR_INDEX_LIST in a system ioctl. 617 564 618 - struct kvm_msrs { 565 + :: 566 + 567 + struct kvm_msrs { 619 568 __u32 nmsrs; /* number of msrs in entries */ 620 569 __u32 pad; 621 570 622 571 struct kvm_msr_entry entries[0]; 623 - }; 572 + }; 624 573 625 - struct kvm_msr_entry { 574 + struct kvm_msr_entry { 626 575 __u32 index; 627 576 __u32 reserved; 628 577 __u64 data; 629 - }; 578 + }; 630 579 631 580 Application code should set the 'nmsrs' member (which indicates the 632 581 size of the entries array) and the 'index' member of each array entry. ··· 636 581 637 582 638 583 4.19 KVM_SET_MSRS 584 + ----------------- 639 585 640 - Capability: basic 641 - Architectures: x86 642 - Type: vcpu ioctl 643 - Parameters: struct kvm_msrs (in) 644 - Returns: number of msrs successfully set (see below), -1 on error 586 + :Capability: basic 587 + :Architectures: x86 588 + :Type: vcpu ioctl 589 + :Parameters: struct kvm_msrs (in) 590 + :Returns: number of msrs successfully set (see below), -1 on error 645 591 646 592 Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the 647 593 data structures. ··· 658 602 659 603 660 604 4.20 KVM_SET_CPUID 605 + ------------------ 661 606 662 - Capability: basic 663 - Architectures: x86 664 - Type: vcpu ioctl 665 - Parameters: struct kvm_cpuid (in) 666 - Returns: 0 on success, -1 on error 607 + :Capability: basic 608 + :Architectures: x86 609 + :Type: vcpu ioctl 610 + :Parameters: struct kvm_cpuid (in) 611 + :Returns: 0 on success, -1 on error 667 612 668 613 Defines the vcpu responses to the cpuid instruction. Applications 669 614 should use the KVM_SET_CPUID2 ioctl if available. 670 615 616 + :: 671 617 672 - struct kvm_cpuid_entry { 618 + struct kvm_cpuid_entry { 673 619 __u32 function; 674 620 __u32 eax; 675 621 __u32 ebx; 676 622 __u32 ecx; 677 623 __u32 edx; 678 624 __u32 padding; 679 - }; 625 + }; 680 626 681 - /* for KVM_SET_CPUID */ 682 - struct kvm_cpuid { 627 + /* for KVM_SET_CPUID */ 628 + struct kvm_cpuid { 683 629 __u32 nent; 684 630 __u32 padding; 685 631 struct kvm_cpuid_entry entries[0]; 686 - }; 632 + }; 687 633 688 634 689 635 4.21 KVM_SET_SIGNAL_MASK 636 + ------------------------ 690 637 691 - Capability: basic 692 - Architectures: all 693 - Type: vcpu ioctl 694 - Parameters: struct kvm_signal_mask (in) 695 - Returns: 0 on success, -1 on error 638 + :Capability: basic 639 + :Architectures: all 640 + :Type: vcpu ioctl 641 + :Parameters: struct kvm_signal_mask (in) 642 + :Returns: 0 on success, -1 on error 696 643 697 644 Defines which signals are blocked during execution of KVM_RUN. This 698 645 signal mask temporarily overrides the threads signal mask. Any ··· 705 646 Note the signal will only be delivered if not blocked by the original 706 647 signal mask. 707 648 708 - /* for KVM_SET_SIGNAL_MASK */ 709 - struct kvm_signal_mask { 649 + :: 650 + 651 + /* for KVM_SET_SIGNAL_MASK */ 652 + struct kvm_signal_mask { 710 653 __u32 len; 711 654 __u8 sigset[0]; 712 - }; 655 + }; 713 656 714 657 715 658 4.22 KVM_GET_FPU 659 + ---------------- 716 660 717 - Capability: basic 718 - Architectures: x86 719 - Type: vcpu ioctl 720 - Parameters: struct kvm_fpu (out) 721 - Returns: 0 on success, -1 on error 661 + :Capability: basic 662 + :Architectures: x86 663 + :Type: vcpu ioctl 664 + :Parameters: struct kvm_fpu (out) 665 + :Returns: 0 on success, -1 on error 722 666 723 667 Reads the floating point state from the vcpu. 724 668 725 - /* for KVM_GET_FPU and KVM_SET_FPU */ 726 - struct kvm_fpu { 669 + :: 670 + 671 + /* for KVM_GET_FPU and KVM_SET_FPU */ 672 + struct kvm_fpu { 727 673 __u8 fpr[8][16]; 728 674 __u16 fcw; 729 675 __u16 fsw; ··· 740 676 __u8 xmm[16][16]; 741 677 __u32 mxcsr; 742 678 __u32 pad2; 743 - }; 679 + }; 744 680 745 681 746 682 4.23 KVM_SET_FPU 683 + ---------------- 747 684 748 - Capability: basic 749 - Architectures: x86 750 - Type: vcpu ioctl 751 - Parameters: struct kvm_fpu (in) 752 - Returns: 0 on success, -1 on error 685 + :Capability: basic 686 + :Architectures: x86 687 + :Type: vcpu ioctl 688 + :Parameters: struct kvm_fpu (in) 689 + :Returns: 0 on success, -1 on error 753 690 754 691 Writes the floating point state to the vcpu. 755 692 756 - /* for KVM_GET_FPU and KVM_SET_FPU */ 757 - struct kvm_fpu { 693 + :: 694 + 695 + /* for KVM_GET_FPU and KVM_SET_FPU */ 696 + struct kvm_fpu { 758 697 __u8 fpr[8][16]; 759 698 __u16 fcw; 760 699 __u16 fsw; ··· 769 702 __u8 xmm[16][16]; 770 703 __u32 mxcsr; 771 704 __u32 pad2; 772 - }; 705 + }; 773 706 774 707 775 708 4.24 KVM_CREATE_IRQCHIP 709 + ----------------------- 776 710 777 - Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390) 778 - Architectures: x86, ARM, arm64, s390 779 - Type: vm ioctl 780 - Parameters: none 781 - Returns: 0 on success, -1 on error 711 + :Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390) 712 + :Architectures: x86, ARM, arm64, s390 713 + :Type: vm ioctl 714 + :Parameters: none 715 + :Returns: 0 on success, -1 on error 782 716 783 717 Creates an interrupt controller model in the kernel. 784 718 On x86, creates a virtual ioapic, a virtual PIC (two PICs, nested), and sets up ··· 795 727 796 728 797 729 4.25 KVM_IRQ_LINE 730 + ----------------- 798 731 799 - Capability: KVM_CAP_IRQCHIP 800 - Architectures: x86, arm, arm64 801 - Type: vm ioctl 802 - Parameters: struct kvm_irq_level 803 - Returns: 0 on success, -1 on error 732 + :Capability: KVM_CAP_IRQCHIP 733 + :Architectures: x86, arm, arm64 734 + :Type: vm ioctl 735 + :Parameters: struct kvm_irq_level 736 + :Returns: 0 on success, -1 on error 804 737 805 738 Sets the level of a GSI input to the interrupt controller model in the kernel. 806 739 On some architectures it is required that an interrupt controller model has ··· 825 756 ARM/arm64 can signal an interrupt either at the CPU level, or at the 826 757 in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to 827 758 use PPIs designated for specific cpus. The irq field is interpreted 828 - like this: 759 + like this:: 829 760 830 761 bits: | 31 ... 28 | 27 ... 24 | 23 ... 16 | 15 ... 0 | 831 762 field: | vcpu2_index | irq_type | vcpu_index | irq_id | 832 763 833 764 The irq_type field has the following values: 834 - - irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ 835 - - irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.) 765 + 766 + - irq_type[0]: 767 + out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ 768 + - irq_type[1]: 769 + in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.) 836 770 (the vcpu_index field is ignored) 837 - - irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.) 771 + - irq_type[2]: 772 + in-kernel GIC: PPI, irq_id between 16 and 31 (incl.) 838 773 839 774 (The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs) 840 775 ··· 852 779 injection of interrupts for the in-kernel irqchip. KVM_IRQ_LINE can always 853 780 be used for a userspace interrupt controller. 854 781 855 - struct kvm_irq_level { 782 + :: 783 + 784 + struct kvm_irq_level { 856 785 union { 857 786 __u32 irq; /* GSI */ 858 787 __s32 status; /* not used for KVM_IRQ_LEVEL */ 859 788 }; 860 789 __u32 level; /* 0 or 1 */ 861 - }; 790 + }; 862 791 863 792 864 793 4.26 KVM_GET_IRQCHIP 794 + -------------------- 865 795 866 - Capability: KVM_CAP_IRQCHIP 867 - Architectures: x86 868 - Type: vm ioctl 869 - Parameters: struct kvm_irqchip (in/out) 870 - Returns: 0 on success, -1 on error 796 + :Capability: KVM_CAP_IRQCHIP 797 + :Architectures: x86 798 + :Type: vm ioctl 799 + :Parameters: struct kvm_irqchip (in/out) 800 + :Returns: 0 on success, -1 on error 871 801 872 802 Reads the state of a kernel interrupt controller created with 873 803 KVM_CREATE_IRQCHIP into a buffer provided by the caller. 874 804 875 - struct kvm_irqchip { 805 + :: 806 + 807 + struct kvm_irqchip { 876 808 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ 877 809 __u32 pad; 878 810 union { ··· 885 807 struct kvm_pic_state pic; 886 808 struct kvm_ioapic_state ioapic; 887 809 } chip; 888 - }; 810 + }; 889 811 890 812 891 813 4.27 KVM_SET_IRQCHIP 814 + -------------------- 892 815 893 - Capability: KVM_CAP_IRQCHIP 894 - Architectures: x86 895 - Type: vm ioctl 896 - Parameters: struct kvm_irqchip (in) 897 - Returns: 0 on success, -1 on error 816 + :Capability: KVM_CAP_IRQCHIP 817 + :Architectures: x86 818 + :Type: vm ioctl 819 + :Parameters: struct kvm_irqchip (in) 820 + :Returns: 0 on success, -1 on error 898 821 899 822 Sets the state of a kernel interrupt controller created with 900 823 KVM_CREATE_IRQCHIP from a buffer provided by the caller. 901 824 902 - struct kvm_irqchip { 825 + :: 826 + 827 + struct kvm_irqchip { 903 828 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ 904 829 __u32 pad; 905 830 union { ··· 910 829 struct kvm_pic_state pic; 911 830 struct kvm_ioapic_state ioapic; 912 831 } chip; 913 - }; 832 + }; 914 833 915 834 916 835 4.28 KVM_XEN_HVM_CONFIG 836 + ----------------------- 917 837 918 - Capability: KVM_CAP_XEN_HVM 919 - Architectures: x86 920 - Type: vm ioctl 921 - Parameters: struct kvm_xen_hvm_config (in) 922 - Returns: 0 on success, -1 on error 838 + :Capability: KVM_CAP_XEN_HVM 839 + :Architectures: x86 840 + :Type: vm ioctl 841 + :Parameters: struct kvm_xen_hvm_config (in) 842 + :Returns: 0 on success, -1 on error 923 843 924 844 Sets the MSR that the Xen HVM guest uses to initialize its hypercall 925 845 page, and provides the starting address and size of the hypercall ··· 928 846 page of a blob (32- or 64-bit, depending on the vcpu mode) to guest 929 847 memory. 930 848 931 - struct kvm_xen_hvm_config { 849 + :: 850 + 851 + struct kvm_xen_hvm_config { 932 852 __u32 flags; 933 853 __u32 msr; 934 854 __u64 blob_addr_32; ··· 938 854 __u8 blob_size_32; 939 855 __u8 blob_size_64; 940 856 __u8 pad2[30]; 941 - }; 857 + }; 942 858 943 859 944 860 4.29 KVM_GET_CLOCK 861 + ------------------ 945 862 946 - Capability: KVM_CAP_ADJUST_CLOCK 947 - Architectures: x86 948 - Type: vm ioctl 949 - Parameters: struct kvm_clock_data (out) 950 - Returns: 0 on success, -1 on error 863 + :Capability: KVM_CAP_ADJUST_CLOCK 864 + :Architectures: x86 865 + :Type: vm ioctl 866 + :Parameters: struct kvm_clock_data (out) 867 + :Returns: 0 on success, -1 on error 951 868 952 869 Gets the current timestamp of kvmclock as seen by the current guest. In 953 870 conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios ··· 965 880 but the exact value read by each VCPU could differ, because the host 966 881 TSC is not stable. 967 882 968 - struct kvm_clock_data { 883 + :: 884 + 885 + struct kvm_clock_data { 969 886 __u64 clock; /* kvmclock current value */ 970 887 __u32 flags; 971 888 __u32 pad[9]; 972 - }; 889 + }; 973 890 974 891 975 892 4.30 KVM_SET_CLOCK 893 + ------------------ 976 894 977 - Capability: KVM_CAP_ADJUST_CLOCK 978 - Architectures: x86 979 - Type: vm ioctl 980 - Parameters: struct kvm_clock_data (in) 981 - Returns: 0 on success, -1 on error 895 + :Capability: KVM_CAP_ADJUST_CLOCK 896 + :Architectures: x86 897 + :Type: vm ioctl 898 + :Parameters: struct kvm_clock_data (in) 899 + :Returns: 0 on success, -1 on error 982 900 983 901 Sets the current timestamp of kvmclock to the value specified in its parameter. 984 902 In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios 985 903 such as migration. 986 904 987 - struct kvm_clock_data { 905 + :: 906 + 907 + struct kvm_clock_data { 988 908 __u64 clock; /* kvmclock current value */ 989 909 __u32 flags; 990 910 __u32 pad[9]; 991 - }; 911 + }; 992 912 993 913 994 914 4.31 KVM_GET_VCPU_EVENTS 915 + ------------------------ 995 916 996 - Capability: KVM_CAP_VCPU_EVENTS 997 - Extended by: KVM_CAP_INTR_SHADOW 998 - Architectures: x86, arm, arm64 999 - Type: vcpu ioctl 1000 - Parameters: struct kvm_vcpu_event (out) 1001 - Returns: 0 on success, -1 on error 917 + :Capability: KVM_CAP_VCPU_EVENTS 918 + :Extended by: KVM_CAP_INTR_SHADOW 919 + :Architectures: x86, arm, arm64 920 + :Type: vcpu ioctl 921 + :Parameters: struct kvm_vcpu_event (out) 922 + :Returns: 0 on success, -1 on error 1002 923 1003 924 X86: 925 + ^^^^ 1004 926 1005 927 Gets currently pending exceptions, interrupts, and NMIs as well as related 1006 928 states of the vcpu. 1007 929 1008 - struct kvm_vcpu_events { 930 + :: 931 + 932 + struct kvm_vcpu_events { 1009 933 struct { 1010 934 __u8 injected; 1011 935 __u8 nr; ··· 1045 951 __u8 reserved[27]; 1046 952 __u8 exception_has_payload; 1047 953 __u64 exception_payload; 1048 - }; 954 + }; 1049 955 1050 956 The following bits are defined in the flags field: 1051 957 ··· 1061 967 KVM_CAP_EXCEPTION_PAYLOAD is enabled. 1062 968 1063 969 ARM/ARM64: 970 + ^^^^^^^^^^ 1064 971 1065 972 If the guest accesses a device that is being emulated by the host kernel in 1066 973 such a way that a real device would generate a physical SError, KVM may make ··· 1101 1006 KVM_SET_VCPU_EVENTS or otherwise) because such an exception is always delivered 1102 1007 directly to the virtual CPU). 1103 1008 1009 + :: 1104 1010 1105 - struct kvm_vcpu_events { 1011 + struct kvm_vcpu_events { 1106 1012 struct { 1107 1013 __u8 serror_pending; 1108 1014 __u8 serror_has_esr; ··· 1113 1017 __u64 serror_esr; 1114 1018 } exception; 1115 1019 __u32 reserved[12]; 1116 - }; 1020 + }; 1117 1021 1118 1022 4.32 KVM_SET_VCPU_EVENTS 1023 + ------------------------ 1119 1024 1120 - Capability: KVM_CAP_VCPU_EVENTS 1121 - Extended by: KVM_CAP_INTR_SHADOW 1122 - Architectures: x86, arm, arm64 1123 - Type: vcpu ioctl 1124 - Parameters: struct kvm_vcpu_event (in) 1125 - Returns: 0 on success, -1 on error 1025 + :Capability: KVM_CAP_VCPU_EVENTS 1026 + :Extended by: KVM_CAP_INTR_SHADOW 1027 + :Architectures: x86, arm, arm64 1028 + :Type: vcpu ioctl 1029 + :Parameters: struct kvm_vcpu_event (in) 1030 + :Returns: 0 on success, -1 on error 1126 1031 1127 1032 X86: 1033 + ^^^^ 1128 1034 1129 1035 Set pending exceptions, interrupts, and NMIs as well as related states of the 1130 1036 vcpu. ··· 1138 1040 smi.pending. Keep the corresponding bits in the flags field cleared to 1139 1041 suppress overwriting the current in-kernel state. The bits are: 1140 1042 1141 - KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel 1142 - KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector 1143 - KVM_VCPUEVENT_VALID_SMM - transfer the smi sub-struct. 1043 + =============================== ================================== 1044 + KVM_VCPUEVENT_VALID_NMI_PENDING transfer nmi.pending to the kernel 1045 + KVM_VCPUEVENT_VALID_SIPI_VECTOR transfer sipi_vector 1046 + KVM_VCPUEVENT_VALID_SMM transfer the smi sub-struct. 1047 + =============================== ================================== 1144 1048 1145 1049 If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in 1146 1050 the flags field to signal that interrupt.shadow contains a valid state and ··· 1156 1056 contain a valid state and shall be written into the VCPU. 1157 1057 1158 1058 ARM/ARM64: 1059 + ^^^^^^^^^^ 1159 1060 1160 1061 User space may need to inject several types of events to the guest. 1161 1062 ··· 1179 1078 1180 1079 1181 1080 4.33 KVM_GET_DEBUGREGS 1081 + ---------------------- 1182 1082 1183 - Capability: KVM_CAP_DEBUGREGS 1184 - Architectures: x86 1185 - Type: vm ioctl 1186 - Parameters: struct kvm_debugregs (out) 1187 - Returns: 0 on success, -1 on error 1083 + :Capability: KVM_CAP_DEBUGREGS 1084 + :Architectures: x86 1085 + :Type: vm ioctl 1086 + :Parameters: struct kvm_debugregs (out) 1087 + :Returns: 0 on success, -1 on error 1188 1088 1189 1089 Reads debug registers from the vcpu. 1190 1090 1191 - struct kvm_debugregs { 1091 + :: 1092 + 1093 + struct kvm_debugregs { 1192 1094 __u64 db[4]; 1193 1095 __u64 dr6; 1194 1096 __u64 dr7; 1195 1097 __u64 flags; 1196 1098 __u64 reserved[9]; 1197 - }; 1099 + }; 1198 1100 1199 1101 1200 1102 4.34 KVM_SET_DEBUGREGS 1103 + ---------------------- 1201 1104 1202 - Capability: KVM_CAP_DEBUGREGS 1203 - Architectures: x86 1204 - Type: vm ioctl 1205 - Parameters: struct kvm_debugregs (in) 1206 - Returns: 0 on success, -1 on error 1105 + :Capability: KVM_CAP_DEBUGREGS 1106 + :Architectures: x86 1107 + :Type: vm ioctl 1108 + :Parameters: struct kvm_debugregs (in) 1109 + :Returns: 0 on success, -1 on error 1207 1110 1208 1111 Writes debug registers into the vcpu. 1209 1112 ··· 1216 1111 1217 1112 1218 1113 4.35 KVM_SET_USER_MEMORY_REGION 1114 + ------------------------------- 1219 1115 1220 - Capability: KVM_CAP_USER_MEMORY 1221 - Architectures: all 1222 - Type: vm ioctl 1223 - Parameters: struct kvm_userspace_memory_region (in) 1224 - Returns: 0 on success, -1 on error 1116 + :Capability: KVM_CAP_USER_MEMORY 1117 + :Architectures: all 1118 + :Type: vm ioctl 1119 + :Parameters: struct kvm_userspace_memory_region (in) 1120 + :Returns: 0 on success, -1 on error 1225 1121 1226 - struct kvm_userspace_memory_region { 1122 + :: 1123 + 1124 + struct kvm_userspace_memory_region { 1227 1125 __u32 slot; 1228 1126 __u32 flags; 1229 1127 __u64 guest_phys_addr; 1230 1128 __u64 memory_size; /* bytes */ 1231 1129 __u64 userspace_addr; /* start of the userspace allocated memory */ 1232 - }; 1130 + }; 1233 1131 1234 - /* for kvm_memory_region::flags */ 1235 - #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) 1236 - #define KVM_MEM_READONLY (1UL << 1) 1132 + /* for kvm_memory_region::flags */ 1133 + #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) 1134 + #define KVM_MEM_READONLY (1UL << 1) 1237 1135 1238 1136 This ioctl allows the user to create, modify or delete a guest physical 1239 1137 memory slot. Bits 0-15 of "slot" specify the slot id and this value ··· 1282 1174 1283 1175 1284 1176 4.36 KVM_SET_TSS_ADDR 1177 + --------------------- 1285 1178 1286 - Capability: KVM_CAP_SET_TSS_ADDR 1287 - Architectures: x86 1288 - Type: vm ioctl 1289 - Parameters: unsigned long tss_address (in) 1290 - Returns: 0 on success, -1 on error 1179 + :Capability: KVM_CAP_SET_TSS_ADDR 1180 + :Architectures: x86 1181 + :Type: vm ioctl 1182 + :Parameters: unsigned long tss_address (in) 1183 + :Returns: 0 on success, -1 on error 1291 1184 1292 1185 This ioctl defines the physical address of a three-page region in the guest 1293 1186 physical address space. The region must be within the first 4GB of the ··· 1302 1193 1303 1194 1304 1195 4.37 KVM_ENABLE_CAP 1196 + ------------------- 1305 1197 1306 - Capability: KVM_CAP_ENABLE_CAP 1307 - Architectures: mips, ppc, s390 1308 - Type: vcpu ioctl 1309 - Parameters: struct kvm_enable_cap (in) 1310 - Returns: 0 on success; -1 on error 1198 + :Capability: KVM_CAP_ENABLE_CAP 1199 + :Architectures: mips, ppc, s390 1200 + :Type: vcpu ioctl 1201 + :Parameters: struct kvm_enable_cap (in) 1202 + :Returns: 0 on success; -1 on error 1311 1203 1312 - Capability: KVM_CAP_ENABLE_CAP_VM 1313 - Architectures: all 1314 - Type: vcpu ioctl 1315 - Parameters: struct kvm_enable_cap (in) 1316 - Returns: 0 on success; -1 on error 1204 + :Capability: KVM_CAP_ENABLE_CAP_VM 1205 + :Architectures: all 1206 + :Type: vcpu ioctl 1207 + :Parameters: struct kvm_enable_cap (in) 1208 + :Returns: 0 on success; -1 on error 1317 1209 1318 - +Not all extensions are enabled by default. Using this ioctl the application 1319 - can enable an extension, making it available to the guest. 1210 + .. note:: 1211 + 1212 + Not all extensions are enabled by default. Using this ioctl the application 1213 + can enable an extension, making it available to the guest. 1320 1214 1321 1215 On systems that do not support this ioctl, it always fails. On systems that 1322 1216 do support it, it only works for extensions that are supported for enablement. ··· 1327 1215 To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should 1328 1216 be used. 1329 1217 1330 - struct kvm_enable_cap { 1218 + :: 1219 + 1220 + struct kvm_enable_cap { 1331 1221 /* in */ 1332 1222 __u32 cap; 1333 1223 1334 1224 The capability that is supposed to get enabled. 1335 1225 1226 + :: 1227 + 1336 1228 __u32 flags; 1337 1229 1338 1230 A bitfield indicating future enhancements. Has to be 0 for now. 1231 + 1232 + :: 1339 1233 1340 1234 __u64 args[4]; 1341 1235 1342 1236 Arguments for enabling a feature. If a feature needs initial values to 1343 1237 function properly, this is the place to put them. 1344 1238 1239 + :: 1240 + 1345 1241 __u8 pad[64]; 1346 - }; 1242 + }; 1347 1243 1348 1244 The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl 1349 1245 for vm-wide capabilities. 1350 1246 1351 1247 4.38 KVM_GET_MP_STATE 1248 + --------------------- 1352 1249 1353 - Capability: KVM_CAP_MP_STATE 1354 - Architectures: x86, s390, arm, arm64 1355 - Type: vcpu ioctl 1356 - Parameters: struct kvm_mp_state (out) 1357 - Returns: 0 on success; -1 on error 1250 + :Capability: KVM_CAP_MP_STATE 1251 + :Architectures: x86, s390, arm, arm64 1252 + :Type: vcpu ioctl 1253 + :Parameters: struct kvm_mp_state (out) 1254 + :Returns: 0 on success; -1 on error 1358 1255 1359 - struct kvm_mp_state { 1256 + :: 1257 + 1258 + struct kvm_mp_state { 1360 1259 __u32 mp_state; 1361 - }; 1260 + }; 1362 1261 1363 1262 Returns the vcpu's current "multiprocessing state" (though also valid on 1364 1263 uniprocessor guests). 1365 1264 1366 1265 Possible values are: 1367 1266 1368 - - KVM_MP_STATE_RUNNABLE: the vcpu is currently running [x86,arm/arm64] 1369 - - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP) 1267 + ========================== =============================================== 1268 + KVM_MP_STATE_RUNNABLE the vcpu is currently running [x86,arm/arm64] 1269 + KVM_MP_STATE_UNINITIALIZED the vcpu is an application processor (AP) 1370 1270 which has not yet received an INIT signal [x86] 1371 - - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is 1271 + KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is 1372 1272 now ready for a SIPI [x86] 1373 - - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and 1273 + KVM_MP_STATE_HALTED the vcpu has executed a HLT instruction and 1374 1274 is waiting for an interrupt [x86] 1375 - - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector 1275 + KVM_MP_STATE_SIPI_RECEIVED the vcpu has just received a SIPI (vector 1376 1276 accessible via KVM_GET_VCPU_EVENTS) [x86] 1377 - - KVM_MP_STATE_STOPPED: the vcpu is stopped [s390,arm/arm64] 1378 - - KVM_MP_STATE_CHECK_STOP: the vcpu is in a special error state [s390] 1379 - - KVM_MP_STATE_OPERATING: the vcpu is operating (running or halted) 1277 + KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64] 1278 + KVM_MP_STATE_CHECK_STOP the vcpu is in a special error state [s390] 1279 + KVM_MP_STATE_OPERATING the vcpu is operating (running or halted) 1380 1280 [s390] 1381 - - KVM_MP_STATE_LOAD: the vcpu is in a special load/startup state 1281 + KVM_MP_STATE_LOAD the vcpu is in a special load/startup state 1382 1282 [s390] 1283 + ========================== =============================================== 1383 1284 1384 1285 On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an 1385 1286 in-kernel irqchip, the multiprocessing state must be maintained by userspace on 1386 1287 these architectures. 1387 1288 1388 1289 For arm/arm64: 1290 + ^^^^^^^^^^^^^^ 1389 1291 1390 1292 The only states that are valid are KVM_MP_STATE_STOPPED and 1391 1293 KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not. 1392 1294 1393 1295 4.39 KVM_SET_MP_STATE 1296 + --------------------- 1394 1297 1395 - Capability: KVM_CAP_MP_STATE 1396 - Architectures: x86, s390, arm, arm64 1397 - Type: vcpu ioctl 1398 - Parameters: struct kvm_mp_state (in) 1399 - Returns: 0 on success; -1 on error 1298 + :Capability: KVM_CAP_MP_STATE 1299 + :Architectures: x86, s390, arm, arm64 1300 + :Type: vcpu ioctl 1301 + :Parameters: struct kvm_mp_state (in) 1302 + :Returns: 0 on success; -1 on error 1400 1303 1401 1304 Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for 1402 1305 arguments. ··· 1421 1294 these architectures. 1422 1295 1423 1296 For arm/arm64: 1297 + ^^^^^^^^^^^^^^ 1424 1298 1425 1299 The only states that are valid are KVM_MP_STATE_STOPPED and 1426 1300 KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not. 1427 1301 1428 1302 4.40 KVM_SET_IDENTITY_MAP_ADDR 1303 + ------------------------------ 1429 1304 1430 - Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR 1431 - Architectures: x86 1432 - Type: vm ioctl 1433 - Parameters: unsigned long identity (in) 1434 - Returns: 0 on success, -1 on error 1305 + :Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR 1306 + :Architectures: x86 1307 + :Type: vm ioctl 1308 + :Parameters: unsigned long identity (in) 1309 + :Returns: 0 on success, -1 on error 1435 1310 1436 1311 This ioctl defines the physical address of a one-page region in the guest 1437 1312 physical address space. The region must be within the first 4GB of the ··· 1451 1322 Fails if any VCPU has already been created. 1452 1323 1453 1324 4.41 KVM_SET_BOOT_CPU_ID 1325 + ------------------------ 1454 1326 1455 - Capability: KVM_CAP_SET_BOOT_CPU_ID 1456 - Architectures: x86 1457 - Type: vm ioctl 1458 - Parameters: unsigned long vcpu_id 1459 - Returns: 0 on success, -1 on error 1327 + :Capability: KVM_CAP_SET_BOOT_CPU_ID 1328 + :Architectures: x86 1329 + :Type: vm ioctl 1330 + :Parameters: unsigned long vcpu_id 1331 + :Returns: 0 on success, -1 on error 1460 1332 1461 1333 Define which vcpu is the Bootstrap Processor (BSP). Values are the same 1462 1334 as the vcpu id in KVM_CREATE_VCPU. If this ioctl is not called, the default ··· 1465 1335 1466 1336 1467 1337 4.42 KVM_GET_XSAVE 1338 + ------------------ 1468 1339 1469 - Capability: KVM_CAP_XSAVE 1470 - Architectures: x86 1471 - Type: vcpu ioctl 1472 - Parameters: struct kvm_xsave (out) 1473 - Returns: 0 on success, -1 on error 1340 + :Capability: KVM_CAP_XSAVE 1341 + :Architectures: x86 1342 + :Type: vcpu ioctl 1343 + :Parameters: struct kvm_xsave (out) 1344 + :Returns: 0 on success, -1 on error 1474 1345 1475 - struct kvm_xsave { 1346 + 1347 + :: 1348 + 1349 + struct kvm_xsave { 1476 1350 __u32 region[1024]; 1477 - }; 1351 + }; 1478 1352 1479 1353 This ioctl would copy current vcpu's xsave struct to the userspace. 1480 1354 1481 1355 1482 1356 4.43 KVM_SET_XSAVE 1357 + ------------------ 1483 1358 1484 - Capability: KVM_CAP_XSAVE 1485 - Architectures: x86 1486 - Type: vcpu ioctl 1487 - Parameters: struct kvm_xsave (in) 1488 - Returns: 0 on success, -1 on error 1359 + :Capability: KVM_CAP_XSAVE 1360 + :Architectures: x86 1361 + :Type: vcpu ioctl 1362 + :Parameters: struct kvm_xsave (in) 1363 + :Returns: 0 on success, -1 on error 1489 1364 1490 - struct kvm_xsave { 1365 + :: 1366 + 1367 + 1368 + struct kvm_xsave { 1491 1369 __u32 region[1024]; 1492 - }; 1370 + }; 1493 1371 1494 1372 This ioctl would copy userspace's xsave struct to the kernel. 1495 1373 1496 1374 1497 1375 4.44 KVM_GET_XCRS 1376 + ----------------- 1498 1377 1499 - Capability: KVM_CAP_XCRS 1500 - Architectures: x86 1501 - Type: vcpu ioctl 1502 - Parameters: struct kvm_xcrs (out) 1503 - Returns: 0 on success, -1 on error 1378 + :Capability: KVM_CAP_XCRS 1379 + :Architectures: x86 1380 + :Type: vcpu ioctl 1381 + :Parameters: struct kvm_xcrs (out) 1382 + :Returns: 0 on success, -1 on error 1504 1383 1505 - struct kvm_xcr { 1384 + :: 1385 + 1386 + struct kvm_xcr { 1506 1387 __u32 xcr; 1507 1388 __u32 reserved; 1508 1389 __u64 value; 1509 - }; 1390 + }; 1510 1391 1511 - struct kvm_xcrs { 1392 + struct kvm_xcrs { 1512 1393 __u32 nr_xcrs; 1513 1394 __u32 flags; 1514 1395 struct kvm_xcr xcrs[KVM_MAX_XCRS]; 1515 1396 __u64 padding[16]; 1516 - }; 1397 + }; 1517 1398 1518 1399 This ioctl would copy current vcpu's xcrs to the userspace. 1519 1400 1520 1401 1521 1402 4.45 KVM_SET_XCRS 1403 + ----------------- 1522 1404 1523 - Capability: KVM_CAP_XCRS 1524 - Architectures: x86 1525 - Type: vcpu ioctl 1526 - Parameters: struct kvm_xcrs (in) 1527 - Returns: 0 on success, -1 on error 1405 + :Capability: KVM_CAP_XCRS 1406 + :Architectures: x86 1407 + :Type: vcpu ioctl 1408 + :Parameters: struct kvm_xcrs (in) 1409 + :Returns: 0 on success, -1 on error 1528 1410 1529 - struct kvm_xcr { 1411 + :: 1412 + 1413 + struct kvm_xcr { 1530 1414 __u32 xcr; 1531 1415 __u32 reserved; 1532 1416 __u64 value; 1533 - }; 1417 + }; 1534 1418 1535 - struct kvm_xcrs { 1419 + struct kvm_xcrs { 1536 1420 __u32 nr_xcrs; 1537 1421 __u32 flags; 1538 1422 struct kvm_xcr xcrs[KVM_MAX_XCRS]; 1539 1423 __u64 padding[16]; 1540 - }; 1424 + }; 1541 1425 1542 1426 This ioctl would set vcpu's xcr to the value userspace specified. 1543 1427 1544 1428 1545 1429 4.46 KVM_GET_SUPPORTED_CPUID 1430 + ---------------------------- 1546 1431 1547 - Capability: KVM_CAP_EXT_CPUID 1548 - Architectures: x86 1549 - Type: system ioctl 1550 - Parameters: struct kvm_cpuid2 (in/out) 1551 - Returns: 0 on success, -1 on error 1432 + :Capability: KVM_CAP_EXT_CPUID 1433 + :Architectures: x86 1434 + :Type: system ioctl 1435 + :Parameters: struct kvm_cpuid2 (in/out) 1436 + :Returns: 0 on success, -1 on error 1552 1437 1553 - struct kvm_cpuid2 { 1438 + :: 1439 + 1440 + struct kvm_cpuid2 { 1554 1441 __u32 nent; 1555 1442 __u32 padding; 1556 1443 struct kvm_cpuid_entry2 entries[0]; 1557 - }; 1444 + }; 1558 1445 1559 - #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0) 1560 - #define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) 1561 - #define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) 1446 + #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0) 1447 + #define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) 1448 + #define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) 1562 1449 1563 - struct kvm_cpuid_entry2 { 1450 + struct kvm_cpuid_entry2 { 1564 1451 __u32 function; 1565 1452 __u32 index; 1566 1453 __u32 flags; ··· 1586 1439 __u32 ecx; 1587 1440 __u32 edx; 1588 1441 __u32 padding[3]; 1589 - }; 1442 + }; 1590 1443 1591 1444 This ioctl returns x86 cpuid features which are supported by both the 1592 1445 hardware and kvm in its default configuration. Userspace can use the ··· 1614 1467 x2apic), may not be present in the host cpu, but are exposed by kvm if it can 1615 1468 emulate them efficiently. The fields in each entry are defined as follows: 1616 1469 1617 - function: the eax value used to obtain the entry 1618 - index: the ecx value used to obtain the entry (for entries that are 1470 + function: 1471 + the eax value used to obtain the entry 1472 + 1473 + index: 1474 + the ecx value used to obtain the entry (for entries that are 1619 1475 affected by ecx) 1620 - flags: an OR of zero or more of the following: 1476 + 1477 + flags: 1478 + an OR of zero or more of the following: 1479 + 1621 1480 KVM_CPUID_FLAG_SIGNIFCANT_INDEX: 1622 1481 if the index field is valid 1623 1482 KVM_CPUID_FLAG_STATEFUL_FUNC: ··· 1633 1480 KVM_CPUID_FLAG_STATE_READ_NEXT: 1634 1481 for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is 1635 1482 the first entry to be read by a cpu 1636 - eax, ebx, ecx, edx: the values returned by the cpuid instruction for 1483 + 1484 + eax, ebx, ecx, edx: 1485 + the values returned by the cpuid instruction for 1637 1486 this function/index combination 1638 1487 1639 1488 The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned 1640 1489 as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC 1641 - support. Instead it is reported via 1490 + support. Instead it is reported via:: 1642 1491 1643 1492 ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER) 1644 1493 ··· 1649 1494 1650 1495 1651 1496 4.47 KVM_PPC_GET_PVINFO 1497 + ----------------------- 1652 1498 1653 - Capability: KVM_CAP_PPC_GET_PVINFO 1654 - Architectures: ppc 1655 - Type: vm ioctl 1656 - Parameters: struct kvm_ppc_pvinfo (out) 1657 - Returns: 0 on success, !0 on error 1499 + :Capability: KVM_CAP_PPC_GET_PVINFO 1500 + :Architectures: ppc 1501 + :Type: vm ioctl 1502 + :Parameters: struct kvm_ppc_pvinfo (out) 1503 + :Returns: 0 on success, !0 on error 1658 1504 1659 - struct kvm_ppc_pvinfo { 1505 + :: 1506 + 1507 + struct kvm_ppc_pvinfo { 1660 1508 __u32 flags; 1661 1509 __u32 hcall[4]; 1662 1510 __u8 pad[108]; 1663 - }; 1511 + }; 1664 1512 1665 1513 This ioctl fetches PV specific information that need to be passed to the guest 1666 1514 using the device tree or other means from vm context. ··· 1673 1515 If any additional field gets added to this structure later on, a bit for that 1674 1516 additional piece of information will be set in the flags bitmap. 1675 1517 1676 - The flags bitmap is defined as: 1518 + The flags bitmap is defined as:: 1677 1519 1678 1520 /* the host supports the ePAPR idle hcall 1679 1521 #define KVM_PPC_PVINFO_FLAGS_EV_IDLE (1<<0) 1680 1522 1681 1523 4.52 KVM_SET_GSI_ROUTING 1524 + ------------------------ 1682 1525 1683 - Capability: KVM_CAP_IRQ_ROUTING 1684 - Architectures: x86 s390 arm arm64 1685 - Type: vm ioctl 1686 - Parameters: struct kvm_irq_routing (in) 1687 - Returns: 0 on success, -1 on error 1526 + :Capability: KVM_CAP_IRQ_ROUTING 1527 + :Architectures: x86 s390 arm arm64 1528 + :Type: vm ioctl 1529 + :Parameters: struct kvm_irq_routing (in) 1530 + :Returns: 0 on success, -1 on error 1688 1531 1689 1532 Sets the GSI routing table entries, overwriting any previously set entries. 1690 1533 1691 1534 On arm/arm64, GSI routing has the following limitation: 1535 + 1692 1536 - GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD. 1693 1537 1694 - struct kvm_irq_routing { 1538 + :: 1539 + 1540 + struct kvm_irq_routing { 1695 1541 __u32 nr; 1696 1542 __u32 flags; 1697 1543 struct kvm_irq_routing_entry entries[0]; 1698 - }; 1544 + }; 1699 1545 1700 1546 No flags are specified so far, the corresponding field must be set to zero. 1701 1547 1702 - struct kvm_irq_routing_entry { 1548 + :: 1549 + 1550 + struct kvm_irq_routing_entry { 1703 1551 __u32 gsi; 1704 1552 __u32 type; 1705 1553 __u32 flags; ··· 1717 1553 struct kvm_irq_routing_hv_sint hv_sint; 1718 1554 __u32 pad[8]; 1719 1555 } u; 1720 - }; 1556 + }; 1721 1557 1722 - /* gsi routing entry types */ 1723 - #define KVM_IRQ_ROUTING_IRQCHIP 1 1724 - #define KVM_IRQ_ROUTING_MSI 2 1725 - #define KVM_IRQ_ROUTING_S390_ADAPTER 3 1726 - #define KVM_IRQ_ROUTING_HV_SINT 4 1558 + /* gsi routing entry types */ 1559 + #define KVM_IRQ_ROUTING_IRQCHIP 1 1560 + #define KVM_IRQ_ROUTING_MSI 2 1561 + #define KVM_IRQ_ROUTING_S390_ADAPTER 3 1562 + #define KVM_IRQ_ROUTING_HV_SINT 4 1727 1563 1728 1564 flags: 1565 + 1729 1566 - KVM_MSI_VALID_DEVID: used along with KVM_IRQ_ROUTING_MSI routing entry 1730 1567 type, specifies that the devid field contains a valid value. The per-VM 1731 1568 KVM_CAP_MSI_DEVID capability advertises the requirement to provide ··· 1734 1569 never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail. 1735 1570 - zero otherwise 1736 1571 1737 - struct kvm_irq_routing_irqchip { 1572 + :: 1573 + 1574 + struct kvm_irq_routing_irqchip { 1738 1575 __u32 irqchip; 1739 1576 __u32 pin; 1740 - }; 1577 + }; 1741 1578 1742 - struct kvm_irq_routing_msi { 1579 + struct kvm_irq_routing_msi { 1743 1580 __u32 address_lo; 1744 1581 __u32 address_hi; 1745 1582 __u32 data; ··· 1749 1582 __u32 pad; 1750 1583 __u32 devid; 1751 1584 }; 1752 - }; 1585 + }; 1753 1586 1754 1587 If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier 1755 1588 for the device that wrote the MSI message. For PCI, this is usually a ··· 1760 1593 address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of 1761 1594 address_hi must be zero. 1762 1595 1763 - struct kvm_irq_routing_s390_adapter { 1596 + :: 1597 + 1598 + struct kvm_irq_routing_s390_adapter { 1764 1599 __u64 ind_addr; 1765 1600 __u64 summary_addr; 1766 1601 __u64 ind_offset; 1767 1602 __u32 summary_offset; 1768 1603 __u32 adapter_id; 1769 - }; 1604 + }; 1770 1605 1771 - struct kvm_irq_routing_hv_sint { 1606 + struct kvm_irq_routing_hv_sint { 1772 1607 __u32 vcpu; 1773 1608 __u32 sint; 1774 - }; 1609 + }; 1775 1610 1776 1611 1777 1612 4.55 KVM_SET_TSC_KHZ 1613 + -------------------- 1778 1614 1779 - Capability: KVM_CAP_TSC_CONTROL 1780 - Architectures: x86 1781 - Type: vcpu ioctl 1782 - Parameters: virtual tsc_khz 1783 - Returns: 0 on success, -1 on error 1615 + :Capability: KVM_CAP_TSC_CONTROL 1616 + :Architectures: x86 1617 + :Type: vcpu ioctl 1618 + :Parameters: virtual tsc_khz 1619 + :Returns: 0 on success, -1 on error 1784 1620 1785 1621 Specifies the tsc frequency for the virtual machine. The unit of the 1786 1622 frequency is KHz. 1787 1623 1788 1624 1789 1625 4.56 KVM_GET_TSC_KHZ 1626 + -------------------- 1790 1627 1791 - Capability: KVM_CAP_GET_TSC_KHZ 1792 - Architectures: x86 1793 - Type: vcpu ioctl 1794 - Parameters: none 1795 - Returns: virtual tsc-khz on success, negative value on error 1628 + :Capability: KVM_CAP_GET_TSC_KHZ 1629 + :Architectures: x86 1630 + :Type: vcpu ioctl 1631 + :Parameters: none 1632 + :Returns: virtual tsc-khz on success, negative value on error 1796 1633 1797 1634 Returns the tsc frequency of the guest. The unit of the return value is 1798 1635 KHz. If the host has unstable tsc this ioctl returns -EIO instead as an ··· 1804 1633 1805 1634 1806 1635 4.57 KVM_GET_LAPIC 1636 + ------------------ 1807 1637 1808 - Capability: KVM_CAP_IRQCHIP 1809 - Architectures: x86 1810 - Type: vcpu ioctl 1811 - Parameters: struct kvm_lapic_state (out) 1812 - Returns: 0 on success, -1 on error 1638 + :Capability: KVM_CAP_IRQCHIP 1639 + :Architectures: x86 1640 + :Type: vcpu ioctl 1641 + :Parameters: struct kvm_lapic_state (out) 1642 + :Returns: 0 on success, -1 on error 1813 1643 1814 - #define KVM_APIC_REG_SIZE 0x400 1815 - struct kvm_lapic_state { 1644 + :: 1645 + 1646 + #define KVM_APIC_REG_SIZE 0x400 1647 + struct kvm_lapic_state { 1816 1648 char regs[KVM_APIC_REG_SIZE]; 1817 - }; 1649 + }; 1818 1650 1819 1651 Reads the Local APIC registers and copies them into the input argument. The 1820 1652 data format and layout are the same as documented in the architecture manual. ··· 1835 1661 1836 1662 1837 1663 4.58 KVM_SET_LAPIC 1664 + ------------------ 1838 1665 1839 - Capability: KVM_CAP_IRQCHIP 1840 - Architectures: x86 1841 - Type: vcpu ioctl 1842 - Parameters: struct kvm_lapic_state (in) 1843 - Returns: 0 on success, -1 on error 1666 + :Capability: KVM_CAP_IRQCHIP 1667 + :Architectures: x86 1668 + :Type: vcpu ioctl 1669 + :Parameters: struct kvm_lapic_state (in) 1670 + :Returns: 0 on success, -1 on error 1844 1671 1845 - #define KVM_APIC_REG_SIZE 0x400 1846 - struct kvm_lapic_state { 1672 + :: 1673 + 1674 + #define KVM_APIC_REG_SIZE 0x400 1675 + struct kvm_lapic_state { 1847 1676 char regs[KVM_APIC_REG_SIZE]; 1848 - }; 1677 + }; 1849 1678 1850 1679 Copies the input argument into the Local APIC registers. The data format 1851 1680 and layout are the same as documented in the architecture manual. ··· 1859 1682 1860 1683 1861 1684 4.59 KVM_IOEVENTFD 1685 + ------------------ 1862 1686 1863 - Capability: KVM_CAP_IOEVENTFD 1864 - Architectures: all 1865 - Type: vm ioctl 1866 - Parameters: struct kvm_ioeventfd (in) 1867 - Returns: 0 on success, !0 on error 1687 + :Capability: KVM_CAP_IOEVENTFD 1688 + :Architectures: all 1689 + :Type: vm ioctl 1690 + :Parameters: struct kvm_ioeventfd (in) 1691 + :Returns: 0 on success, !0 on error 1868 1692 1869 1693 This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address 1870 1694 within the guest. A guest write in the registered address will signal the 1871 1695 provided event instead of triggering an exit. 1872 1696 1873 - struct kvm_ioeventfd { 1697 + :: 1698 + 1699 + struct kvm_ioeventfd { 1874 1700 __u64 datamatch; 1875 1701 __u64 addr; /* legal pio/mmio address */ 1876 1702 __u32 len; /* 0, 1, 2, 4, or 8 bytes */ 1877 1703 __s32 fd; 1878 1704 __u32 flags; 1879 1705 __u8 pad[36]; 1880 - }; 1706 + }; 1881 1707 1882 1708 For the special case of virtio-ccw devices on s390, the ioevent is matched 1883 1709 to a subchannel/virtqueue tuple instead. 1884 1710 1885 - The following flags are defined: 1711 + The following flags are defined:: 1886 1712 1887 - #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch) 1888 - #define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio) 1889 - #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign) 1890 - #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \ 1713 + #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch) 1714 + #define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio) 1715 + #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign) 1716 + #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \ 1891 1717 (1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify) 1892 1718 1893 1719 If datamatch flag is set, the event will be signaled only if the written value ··· 1905 1725 work anyway. 1906 1726 1907 1727 4.60 KVM_DIRTY_TLB 1728 + ------------------ 1908 1729 1909 - Capability: KVM_CAP_SW_TLB 1910 - Architectures: ppc 1911 - Type: vcpu ioctl 1912 - Parameters: struct kvm_dirty_tlb (in) 1913 - Returns: 0 on success, -1 on error 1730 + :Capability: KVM_CAP_SW_TLB 1731 + :Architectures: ppc 1732 + :Type: vcpu ioctl 1733 + :Parameters: struct kvm_dirty_tlb (in) 1734 + :Returns: 0 on success, -1 on error 1914 1735 1915 - struct kvm_dirty_tlb { 1736 + :: 1737 + 1738 + struct kvm_dirty_tlb { 1916 1739 __u64 bitmap; 1917 1740 __u32 num_dirty; 1918 - }; 1741 + }; 1919 1742 1920 1743 This must be called whenever userspace has changed an entry in the shared 1921 1744 TLB, prior to calling KVM_RUN on the associated vcpu. ··· 1941 1758 1942 1759 1943 1760 4.62 KVM_CREATE_SPAPR_TCE 1761 + ------------------------- 1944 1762 1945 - Capability: KVM_CAP_SPAPR_TCE 1946 - Architectures: powerpc 1947 - Type: vm ioctl 1948 - Parameters: struct kvm_create_spapr_tce (in) 1949 - Returns: file descriptor for manipulating the created TCE table 1763 + :Capability: KVM_CAP_SPAPR_TCE 1764 + :Architectures: powerpc 1765 + :Type: vm ioctl 1766 + :Parameters: struct kvm_create_spapr_tce (in) 1767 + :Returns: file descriptor for manipulating the created TCE table 1950 1768 1951 1769 This creates a virtual TCE (translation control entry) table, which 1952 1770 is an IOMMU for PAPR-style virtual I/O. It is used to translate 1953 1771 logical addresses used in virtual I/O into guest physical addresses, 1954 1772 and provides a scatter/gather capability for PAPR virtual I/O. 1955 1773 1956 - /* for KVM_CAP_SPAPR_TCE */ 1957 - struct kvm_create_spapr_tce { 1774 + :: 1775 + 1776 + /* for KVM_CAP_SPAPR_TCE */ 1777 + struct kvm_create_spapr_tce { 1958 1778 __u64 liobn; 1959 1779 __u32 window_size; 1960 - }; 1780 + }; 1961 1781 1962 1782 The liobn field gives the logical IO bus number for which to create a 1963 1783 TCE table. The window_size field specifies the size of the DMA window ··· 1980 1794 1981 1795 1982 1796 4.63 KVM_ALLOCATE_RMA 1797 + --------------------- 1983 1798 1984 - Capability: KVM_CAP_PPC_RMA 1985 - Architectures: powerpc 1986 - Type: vm ioctl 1987 - Parameters: struct kvm_allocate_rma (out) 1988 - Returns: file descriptor for mapping the allocated RMA 1799 + :Capability: KVM_CAP_PPC_RMA 1800 + :Architectures: powerpc 1801 + :Type: vm ioctl 1802 + :Parameters: struct kvm_allocate_rma (out) 1803 + :Returns: file descriptor for mapping the allocated RMA 1989 1804 1990 1805 This allocates a Real Mode Area (RMA) from the pool allocated at boot 1991 1806 time by the kernel. An RMA is a physically-contiguous, aligned region ··· 1995 1808 POWER processors support a set of sizes for the RMA that usually 1996 1809 includes 64MB, 128MB, 256MB and some larger powers of two. 1997 1810 1998 - /* for KVM_ALLOCATE_RMA */ 1999 - struct kvm_allocate_rma { 1811 + :: 1812 + 1813 + /* for KVM_ALLOCATE_RMA */ 1814 + struct kvm_allocate_rma { 2000 1815 __u64 rma_size; 2001 - }; 1816 + }; 2002 1817 2003 1818 The return value is a file descriptor which can be passed to mmap(2) 2004 1819 to map the allocated RMA into userspace. The mapped area can then be ··· 2016 1827 2017 1828 2018 1829 4.64 KVM_NMI 1830 + ------------ 2019 1831 2020 - Capability: KVM_CAP_USER_NMI 2021 - Architectures: x86 2022 - Type: vcpu ioctl 2023 - Parameters: none 2024 - Returns: 0 on success, -1 on error 1832 + :Capability: KVM_CAP_USER_NMI 1833 + :Architectures: x86 1834 + :Type: vcpu ioctl 1835 + :Parameters: none 1836 + :Returns: 0 on success, -1 on error 2025 1837 2026 1838 Queues an NMI on the thread's vcpu. Note this is well defined only 2027 1839 when KVM_CREATE_IRQCHIP has not been called, since this is an interface ··· 2043 1853 2044 1854 2045 1855 4.65 KVM_S390_UCAS_MAP 1856 + ---------------------- 2046 1857 2047 - Capability: KVM_CAP_S390_UCONTROL 2048 - Architectures: s390 2049 - Type: vcpu ioctl 2050 - Parameters: struct kvm_s390_ucas_mapping (in) 2051 - Returns: 0 in case of success 1858 + :Capability: KVM_CAP_S390_UCONTROL 1859 + :Architectures: s390 1860 + :Type: vcpu ioctl 1861 + :Parameters: struct kvm_s390_ucas_mapping (in) 1862 + :Returns: 0 in case of success 2052 1863 2053 - The parameter is defined like this: 1864 + The parameter is defined like this:: 1865 + 2054 1866 struct kvm_s390_ucas_mapping { 2055 1867 __u64 user_addr; 2056 1868 __u64 vcpu_addr; ··· 2065 1873 2066 1874 2067 1875 4.66 KVM_S390_UCAS_UNMAP 1876 + ------------------------ 2068 1877 2069 - Capability: KVM_CAP_S390_UCONTROL 2070 - Architectures: s390 2071 - Type: vcpu ioctl 2072 - Parameters: struct kvm_s390_ucas_mapping (in) 2073 - Returns: 0 in case of success 1878 + :Capability: KVM_CAP_S390_UCONTROL 1879 + :Architectures: s390 1880 + :Type: vcpu ioctl 1881 + :Parameters: struct kvm_s390_ucas_mapping (in) 1882 + :Returns: 0 in case of success 2074 1883 2075 - The parameter is defined like this: 1884 + The parameter is defined like this:: 1885 + 2076 1886 struct kvm_s390_ucas_mapping { 2077 1887 __u64 user_addr; 2078 1888 __u64 vcpu_addr; ··· 2087 1893 2088 1894 2089 1895 4.67 KVM_S390_VCPU_FAULT 1896 + ------------------------ 2090 1897 2091 - Capability: KVM_CAP_S390_UCONTROL 2092 - Architectures: s390 2093 - Type: vcpu ioctl 2094 - Parameters: vcpu absolute address (in) 2095 - Returns: 0 in case of success 1898 + :Capability: KVM_CAP_S390_UCONTROL 1899 + :Architectures: s390 1900 + :Type: vcpu ioctl 1901 + :Parameters: vcpu absolute address (in) 1902 + :Returns: 0 in case of success 2096 1903 2097 1904 This call creates a page table entry on the virtual cpu's address space 2098 1905 (for user controlled virtual machines) or the virtual machine's address ··· 2105 1910 2106 1911 2107 1912 4.68 KVM_SET_ONE_REG 1913 + -------------------- 2108 1914 2109 - Capability: KVM_CAP_ONE_REG 2110 - Architectures: all 2111 - Type: vcpu ioctl 2112 - Parameters: struct kvm_one_reg (in) 2113 - Returns: 0 on success, negative value on failure 1915 + :Capability: KVM_CAP_ONE_REG 1916 + :Architectures: all 1917 + :Type: vcpu ioctl 1918 + :Parameters: struct kvm_one_reg (in) 1919 + :Returns: 0 on success, negative value on failure 1920 + 2114 1921 Errors: 2115 - ENOENT: no such register 2116 - EINVAL: invalid register ID, or no such register 2117 - EPERM: (arm64) register access not allowed before vcpu finalization 1922 + 1923 + ====== ============================================================ 1924 + ENOENT no such register 1925 + EINVAL invalid register ID, or no such register 1926 + EPERM (arm64) register access not allowed before vcpu finalization 1927 + ====== ============================================================ 1928 + 2118 1929 (These error codes are indicative only: do not rely on a specific error 2119 1930 code being returned in a specific situation.) 2120 1931 2121 - struct kvm_one_reg { 1932 + :: 1933 + 1934 + struct kvm_one_reg { 2122 1935 __u64 id; 2123 1936 __u64 addr; 2124 - }; 1937 + }; 2125 1938 2126 1939 Using this ioctl, a single vcpu register can be set to a specific value 2127 1940 defined by user space with the passed in struct kvm_one_reg, where id ··· 2139 1936 and their own constants and width. To keep track of the implemented 2140 1937 registers, find a list below: 2141 1938 2142 - Arch | Register | Width (bits) 2143 - | | 2144 - PPC | KVM_REG_PPC_HIOR | 64 2145 - PPC | KVM_REG_PPC_IAC1 | 64 2146 - PPC | KVM_REG_PPC_IAC2 | 64 2147 - PPC | KVM_REG_PPC_IAC3 | 64 2148 - PPC | KVM_REG_PPC_IAC4 | 64 2149 - PPC | KVM_REG_PPC_DAC1 | 64 2150 - PPC | KVM_REG_PPC_DAC2 | 64 2151 - PPC | KVM_REG_PPC_DABR | 64 2152 - PPC | KVM_REG_PPC_DSCR | 64 2153 - PPC | KVM_REG_PPC_PURR | 64 2154 - PPC | KVM_REG_PPC_SPURR | 64 2155 - PPC | KVM_REG_PPC_DAR | 64 2156 - PPC | KVM_REG_PPC_DSISR | 32 2157 - PPC | KVM_REG_PPC_AMR | 64 2158 - PPC | KVM_REG_PPC_UAMOR | 64 2159 - PPC | KVM_REG_PPC_MMCR0 | 64 2160 - PPC | KVM_REG_PPC_MMCR1 | 64 2161 - PPC | KVM_REG_PPC_MMCRA | 64 2162 - PPC | KVM_REG_PPC_MMCR2 | 64 2163 - PPC | KVM_REG_PPC_MMCRS | 64 2164 - PPC | KVM_REG_PPC_SIAR | 64 2165 - PPC | KVM_REG_PPC_SDAR | 64 2166 - PPC | KVM_REG_PPC_SIER | 64 2167 - PPC | KVM_REG_PPC_PMC1 | 32 2168 - PPC | KVM_REG_PPC_PMC2 | 32 2169 - PPC | KVM_REG_PPC_PMC3 | 32 2170 - PPC | KVM_REG_PPC_PMC4 | 32 2171 - PPC | KVM_REG_PPC_PMC5 | 32 2172 - PPC | KVM_REG_PPC_PMC6 | 32 2173 - PPC | KVM_REG_PPC_PMC7 | 32 2174 - PPC | KVM_REG_PPC_PMC8 | 32 2175 - PPC | KVM_REG_PPC_FPR0 | 64 2176 - ... 2177 - PPC | KVM_REG_PPC_FPR31 | 64 2178 - PPC | KVM_REG_PPC_VR0 | 128 2179 - ... 2180 - PPC | KVM_REG_PPC_VR31 | 128 2181 - PPC | KVM_REG_PPC_VSR0 | 128 2182 - ... 2183 - PPC | KVM_REG_PPC_VSR31 | 128 2184 - PPC | KVM_REG_PPC_FPSCR | 64 2185 - PPC | KVM_REG_PPC_VSCR | 32 2186 - PPC | KVM_REG_PPC_VPA_ADDR | 64 2187 - PPC | KVM_REG_PPC_VPA_SLB | 128 2188 - PPC | KVM_REG_PPC_VPA_DTL | 128 2189 - PPC | KVM_REG_PPC_EPCR | 32 2190 - PPC | KVM_REG_PPC_EPR | 32 2191 - PPC | KVM_REG_PPC_TCR | 32 2192 - PPC | KVM_REG_PPC_TSR | 32 2193 - PPC | KVM_REG_PPC_OR_TSR | 32 2194 - PPC | KVM_REG_PPC_CLEAR_TSR | 32 2195 - PPC | KVM_REG_PPC_MAS0 | 32 2196 - PPC | KVM_REG_PPC_MAS1 | 32 2197 - PPC | KVM_REG_PPC_MAS2 | 64 2198 - PPC | KVM_REG_PPC_MAS7_3 | 64 2199 - PPC | KVM_REG_PPC_MAS4 | 32 2200 - PPC | KVM_REG_PPC_MAS6 | 32 2201 - PPC | KVM_REG_PPC_MMUCFG | 32 2202 - PPC | KVM_REG_PPC_TLB0CFG | 32 2203 - PPC | KVM_REG_PPC_TLB1CFG | 32 2204 - PPC | KVM_REG_PPC_TLB2CFG | 32 2205 - PPC | KVM_REG_PPC_TLB3CFG | 32 2206 - PPC | KVM_REG_PPC_TLB0PS | 32 2207 - PPC | KVM_REG_PPC_TLB1PS | 32 2208 - PPC | KVM_REG_PPC_TLB2PS | 32 2209 - PPC | KVM_REG_PPC_TLB3PS | 32 2210 - PPC | KVM_REG_PPC_EPTCFG | 32 2211 - PPC | KVM_REG_PPC_ICP_STATE | 64 2212 - PPC | KVM_REG_PPC_VP_STATE | 128 2213 - PPC | KVM_REG_PPC_TB_OFFSET | 64 2214 - PPC | KVM_REG_PPC_SPMC1 | 32 2215 - PPC | KVM_REG_PPC_SPMC2 | 32 2216 - PPC | KVM_REG_PPC_IAMR | 64 2217 - PPC | KVM_REG_PPC_TFHAR | 64 2218 - PPC | KVM_REG_PPC_TFIAR | 64 2219 - PPC | KVM_REG_PPC_TEXASR | 64 2220 - PPC | KVM_REG_PPC_FSCR | 64 2221 - PPC | KVM_REG_PPC_PSPB | 32 2222 - PPC | KVM_REG_PPC_EBBHR | 64 2223 - PPC | KVM_REG_PPC_EBBRR | 64 2224 - PPC | KVM_REG_PPC_BESCR | 64 2225 - PPC | KVM_REG_PPC_TAR | 64 2226 - PPC | KVM_REG_PPC_DPDES | 64 2227 - PPC | KVM_REG_PPC_DAWR | 64 2228 - PPC | KVM_REG_PPC_DAWRX | 64 2229 - PPC | KVM_REG_PPC_CIABR | 64 2230 - PPC | KVM_REG_PPC_IC | 64 2231 - PPC | KVM_REG_PPC_VTB | 64 2232 - PPC | KVM_REG_PPC_CSIGR | 64 2233 - PPC | KVM_REG_PPC_TACR | 64 2234 - PPC | KVM_REG_PPC_TCSCR | 64 2235 - PPC | KVM_REG_PPC_PID | 64 2236 - PPC | KVM_REG_PPC_ACOP | 64 2237 - PPC | KVM_REG_PPC_VRSAVE | 32 2238 - PPC | KVM_REG_PPC_LPCR | 32 2239 - PPC | KVM_REG_PPC_LPCR_64 | 64 2240 - PPC | KVM_REG_PPC_PPR | 64 2241 - PPC | KVM_REG_PPC_ARCH_COMPAT | 32 2242 - PPC | KVM_REG_PPC_DABRX | 32 2243 - PPC | KVM_REG_PPC_WORT | 64 2244 - PPC | KVM_REG_PPC_SPRG9 | 64 2245 - PPC | KVM_REG_PPC_DBSR | 32 2246 - PPC | KVM_REG_PPC_TIDR | 64 2247 - PPC | KVM_REG_PPC_PSSCR | 64 2248 - PPC | KVM_REG_PPC_DEC_EXPIRY | 64 2249 - PPC | KVM_REG_PPC_PTCR | 64 2250 - PPC | KVM_REG_PPC_TM_GPR0 | 64 2251 - ... 2252 - PPC | KVM_REG_PPC_TM_GPR31 | 64 2253 - PPC | KVM_REG_PPC_TM_VSR0 | 128 2254 - ... 2255 - PPC | KVM_REG_PPC_TM_VSR63 | 128 2256 - PPC | KVM_REG_PPC_TM_CR | 64 2257 - PPC | KVM_REG_PPC_TM_LR | 64 2258 - PPC | KVM_REG_PPC_TM_CTR | 64 2259 - PPC | KVM_REG_PPC_TM_FPSCR | 64 2260 - PPC | KVM_REG_PPC_TM_AMR | 64 2261 - PPC | KVM_REG_PPC_TM_PPR | 64 2262 - PPC | KVM_REG_PPC_TM_VRSAVE | 64 2263 - PPC | KVM_REG_PPC_TM_VSCR | 32 2264 - PPC | KVM_REG_PPC_TM_DSCR | 64 2265 - PPC | KVM_REG_PPC_TM_TAR | 64 2266 - PPC | KVM_REG_PPC_TM_XER | 64 2267 - | | 2268 - MIPS | KVM_REG_MIPS_R0 | 64 2269 - ... 2270 - MIPS | KVM_REG_MIPS_R31 | 64 2271 - MIPS | KVM_REG_MIPS_HI | 64 2272 - MIPS | KVM_REG_MIPS_LO | 64 2273 - MIPS | KVM_REG_MIPS_PC | 64 2274 - MIPS | KVM_REG_MIPS_CP0_INDEX | 32 2275 - MIPS | KVM_REG_MIPS_CP0_ENTRYLO0 | 64 2276 - MIPS | KVM_REG_MIPS_CP0_ENTRYLO1 | 64 2277 - MIPS | KVM_REG_MIPS_CP0_CONTEXT | 64 2278 - MIPS | KVM_REG_MIPS_CP0_CONTEXTCONFIG| 32 2279 - MIPS | KVM_REG_MIPS_CP0_USERLOCAL | 64 2280 - MIPS | KVM_REG_MIPS_CP0_XCONTEXTCONFIG| 64 2281 - MIPS | KVM_REG_MIPS_CP0_PAGEMASK | 32 2282 - MIPS | KVM_REG_MIPS_CP0_PAGEGRAIN | 32 2283 - MIPS | KVM_REG_MIPS_CP0_SEGCTL0 | 64 2284 - MIPS | KVM_REG_MIPS_CP0_SEGCTL1 | 64 2285 - MIPS | KVM_REG_MIPS_CP0_SEGCTL2 | 64 2286 - MIPS | KVM_REG_MIPS_CP0_PWBASE | 64 2287 - MIPS | KVM_REG_MIPS_CP0_PWFIELD | 64 2288 - MIPS | KVM_REG_MIPS_CP0_PWSIZE | 64 2289 - MIPS | KVM_REG_MIPS_CP0_WIRED | 32 2290 - MIPS | KVM_REG_MIPS_CP0_PWCTL | 32 2291 - MIPS | KVM_REG_MIPS_CP0_HWRENA | 32 2292 - MIPS | KVM_REG_MIPS_CP0_BADVADDR | 64 2293 - MIPS | KVM_REG_MIPS_CP0_BADINSTR | 32 2294 - MIPS | KVM_REG_MIPS_CP0_BADINSTRP | 32 2295 - MIPS | KVM_REG_MIPS_CP0_COUNT | 32 2296 - MIPS | KVM_REG_MIPS_CP0_ENTRYHI | 64 2297 - MIPS | KVM_REG_MIPS_CP0_COMPARE | 32 2298 - MIPS | KVM_REG_MIPS_CP0_STATUS | 32 2299 - MIPS | KVM_REG_MIPS_CP0_INTCTL | 32 2300 - MIPS | KVM_REG_MIPS_CP0_CAUSE | 32 2301 - MIPS | KVM_REG_MIPS_CP0_EPC | 64 2302 - MIPS | KVM_REG_MIPS_CP0_PRID | 32 2303 - MIPS | KVM_REG_MIPS_CP0_EBASE | 64 2304 - MIPS | KVM_REG_MIPS_CP0_CONFIG | 32 2305 - MIPS | KVM_REG_MIPS_CP0_CONFIG1 | 32 2306 - MIPS | KVM_REG_MIPS_CP0_CONFIG2 | 32 2307 - MIPS | KVM_REG_MIPS_CP0_CONFIG3 | 32 2308 - MIPS | KVM_REG_MIPS_CP0_CONFIG4 | 32 2309 - MIPS | KVM_REG_MIPS_CP0_CONFIG5 | 32 2310 - MIPS | KVM_REG_MIPS_CP0_CONFIG7 | 32 2311 - MIPS | KVM_REG_MIPS_CP0_XCONTEXT | 64 2312 - MIPS | KVM_REG_MIPS_CP0_ERROREPC | 64 2313 - MIPS | KVM_REG_MIPS_CP0_KSCRATCH1 | 64 2314 - MIPS | KVM_REG_MIPS_CP0_KSCRATCH2 | 64 2315 - MIPS | KVM_REG_MIPS_CP0_KSCRATCH3 | 64 2316 - MIPS | KVM_REG_MIPS_CP0_KSCRATCH4 | 64 2317 - MIPS | KVM_REG_MIPS_CP0_KSCRATCH5 | 64 2318 - MIPS | KVM_REG_MIPS_CP0_KSCRATCH6 | 64 2319 - MIPS | KVM_REG_MIPS_CP0_MAAR(0..63) | 64 2320 - MIPS | KVM_REG_MIPS_COUNT_CTL | 64 2321 - MIPS | KVM_REG_MIPS_COUNT_RESUME | 64 2322 - MIPS | KVM_REG_MIPS_COUNT_HZ | 64 2323 - MIPS | KVM_REG_MIPS_FPR_32(0..31) | 32 2324 - MIPS | KVM_REG_MIPS_FPR_64(0..31) | 64 2325 - MIPS | KVM_REG_MIPS_VEC_128(0..31) | 128 2326 - MIPS | KVM_REG_MIPS_FCR_IR | 32 2327 - MIPS | KVM_REG_MIPS_FCR_CSR | 32 2328 - MIPS | KVM_REG_MIPS_MSA_IR | 32 2329 - MIPS | KVM_REG_MIPS_MSA_CSR | 32 1939 + ======= =============================== ============ 1940 + Arch Register Width (bits) 1941 + ======= =============================== ============ 1942 + PPC KVM_REG_PPC_HIOR 64 1943 + PPC KVM_REG_PPC_IAC1 64 1944 + PPC KVM_REG_PPC_IAC2 64 1945 + PPC KVM_REG_PPC_IAC3 64 1946 + PPC KVM_REG_PPC_IAC4 64 1947 + PPC KVM_REG_PPC_DAC1 64 1948 + PPC KVM_REG_PPC_DAC2 64 1949 + PPC KVM_REG_PPC_DABR 64 1950 + PPC KVM_REG_PPC_DSCR 64 1951 + PPC KVM_REG_PPC_PURR 64 1952 + PPC KVM_REG_PPC_SPURR 64 1953 + PPC KVM_REG_PPC_DAR 64 1954 + PPC KVM_REG_PPC_DSISR 32 1955 + PPC KVM_REG_PPC_AMR 64 1956 + PPC KVM_REG_PPC_UAMOR 64 1957 + PPC KVM_REG_PPC_MMCR0 64 1958 + PPC KVM_REG_PPC_MMCR1 64 1959 + PPC KVM_REG_PPC_MMCRA 64 1960 + PPC KVM_REG_PPC_MMCR2 64 1961 + PPC KVM_REG_PPC_MMCRS 64 1962 + PPC KVM_REG_PPC_SIAR 64 1963 + PPC KVM_REG_PPC_SDAR 64 1964 + PPC KVM_REG_PPC_SIER 64 1965 + PPC KVM_REG_PPC_PMC1 32 1966 + PPC KVM_REG_PPC_PMC2 32 1967 + PPC KVM_REG_PPC_PMC3 32 1968 + PPC KVM_REG_PPC_PMC4 32 1969 + PPC KVM_REG_PPC_PMC5 32 1970 + PPC KVM_REG_PPC_PMC6 32 1971 + PPC KVM_REG_PPC_PMC7 32 1972 + PPC KVM_REG_PPC_PMC8 32 1973 + PPC KVM_REG_PPC_FPR0 64 1974 + ... 1975 + PPC KVM_REG_PPC_FPR31 64 1976 + PPC KVM_REG_PPC_VR0 128 1977 + ... 1978 + PPC KVM_REG_PPC_VR31 128 1979 + PPC KVM_REG_PPC_VSR0 128 1980 + ... 1981 + PPC KVM_REG_PPC_VSR31 128 1982 + PPC KVM_REG_PPC_FPSCR 64 1983 + PPC KVM_REG_PPC_VSCR 32 1984 + PPC KVM_REG_PPC_VPA_ADDR 64 1985 + PPC KVM_REG_PPC_VPA_SLB 128 1986 + PPC KVM_REG_PPC_VPA_DTL 128 1987 + PPC KVM_REG_PPC_EPCR 32 1988 + PPC KVM_REG_PPC_EPR 32 1989 + PPC KVM_REG_PPC_TCR 32 1990 + PPC KVM_REG_PPC_TSR 32 1991 + PPC KVM_REG_PPC_OR_TSR 32 1992 + PPC KVM_REG_PPC_CLEAR_TSR 32 1993 + PPC KVM_REG_PPC_MAS0 32 1994 + PPC KVM_REG_PPC_MAS1 32 1995 + PPC KVM_REG_PPC_MAS2 64 1996 + PPC KVM_REG_PPC_MAS7_3 64 1997 + PPC KVM_REG_PPC_MAS4 32 1998 + PPC KVM_REG_PPC_MAS6 32 1999 + PPC KVM_REG_PPC_MMUCFG 32 2000 + PPC KVM_REG_PPC_TLB0CFG 32 2001 + PPC KVM_REG_PPC_TLB1CFG 32 2002 + PPC KVM_REG_PPC_TLB2CFG 32 2003 + PPC KVM_REG_PPC_TLB3CFG 32 2004 + PPC KVM_REG_PPC_TLB0PS 32 2005 + PPC KVM_REG_PPC_TLB1PS 32 2006 + PPC KVM_REG_PPC_TLB2PS 32 2007 + PPC KVM_REG_PPC_TLB3PS 32 2008 + PPC KVM_REG_PPC_EPTCFG 32 2009 + PPC KVM_REG_PPC_ICP_STATE 64 2010 + PPC KVM_REG_PPC_VP_STATE 128 2011 + PPC KVM_REG_PPC_TB_OFFSET 64 2012 + PPC KVM_REG_PPC_SPMC1 32 2013 + PPC KVM_REG_PPC_SPMC2 32 2014 + PPC KVM_REG_PPC_IAMR 64 2015 + PPC KVM_REG_PPC_TFHAR 64 2016 + PPC KVM_REG_PPC_TFIAR 64 2017 + PPC KVM_REG_PPC_TEXASR 64 2018 + PPC KVM_REG_PPC_FSCR 64 2019 + PPC KVM_REG_PPC_PSPB 32 2020 + PPC KVM_REG_PPC_EBBHR 64 2021 + PPC KVM_REG_PPC_EBBRR 64 2022 + PPC KVM_REG_PPC_BESCR 64 2023 + PPC KVM_REG_PPC_TAR 64 2024 + PPC KVM_REG_PPC_DPDES 64 2025 + PPC KVM_REG_PPC_DAWR 64 2026 + PPC KVM_REG_PPC_DAWRX 64 2027 + PPC KVM_REG_PPC_CIABR 64 2028 + PPC KVM_REG_PPC_IC 64 2029 + PPC KVM_REG_PPC_VTB 64 2030 + PPC KVM_REG_PPC_CSIGR 64 2031 + PPC KVM_REG_PPC_TACR 64 2032 + PPC KVM_REG_PPC_TCSCR 64 2033 + PPC KVM_REG_PPC_PID 64 2034 + PPC KVM_REG_PPC_ACOP 64 2035 + PPC KVM_REG_PPC_VRSAVE 32 2036 + PPC KVM_REG_PPC_LPCR 32 2037 + PPC KVM_REG_PPC_LPCR_64 64 2038 + PPC KVM_REG_PPC_PPR 64 2039 + PPC KVM_REG_PPC_ARCH_COMPAT 32 2040 + PPC KVM_REG_PPC_DABRX 32 2041 + PPC KVM_REG_PPC_WORT 64 2042 + PPC KVM_REG_PPC_SPRG9 64 2043 + PPC KVM_REG_PPC_DBSR 32 2044 + PPC KVM_REG_PPC_TIDR 64 2045 + PPC KVM_REG_PPC_PSSCR 64 2046 + PPC KVM_REG_PPC_DEC_EXPIRY 64 2047 + PPC KVM_REG_PPC_PTCR 64 2048 + PPC KVM_REG_PPC_TM_GPR0 64 2049 + ... 2050 + PPC KVM_REG_PPC_TM_GPR31 64 2051 + PPC KVM_REG_PPC_TM_VSR0 128 2052 + ... 2053 + PPC KVM_REG_PPC_TM_VSR63 128 2054 + PPC KVM_REG_PPC_TM_CR 64 2055 + PPC KVM_REG_PPC_TM_LR 64 2056 + PPC KVM_REG_PPC_TM_CTR 64 2057 + PPC KVM_REG_PPC_TM_FPSCR 64 2058 + PPC KVM_REG_PPC_TM_AMR 64 2059 + PPC KVM_REG_PPC_TM_PPR 64 2060 + PPC KVM_REG_PPC_TM_VRSAVE 64 2061 + PPC KVM_REG_PPC_TM_VSCR 32 2062 + PPC KVM_REG_PPC_TM_DSCR 64 2063 + PPC KVM_REG_PPC_TM_TAR 64 2064 + PPC KVM_REG_PPC_TM_XER 64 2065 + 2066 + MIPS KVM_REG_MIPS_R0 64 2067 + ... 2068 + MIPS KVM_REG_MIPS_R31 64 2069 + MIPS KVM_REG_MIPS_HI 64 2070 + MIPS KVM_REG_MIPS_LO 64 2071 + MIPS KVM_REG_MIPS_PC 64 2072 + MIPS KVM_REG_MIPS_CP0_INDEX 32 2073 + MIPS KVM_REG_MIPS_CP0_ENTRYLO0 64 2074 + MIPS KVM_REG_MIPS_CP0_ENTRYLO1 64 2075 + MIPS KVM_REG_MIPS_CP0_CONTEXT 64 2076 + MIPS KVM_REG_MIPS_CP0_CONTEXTCONFIG 32 2077 + MIPS KVM_REG_MIPS_CP0_USERLOCAL 64 2078 + MIPS KVM_REG_MIPS_CP0_XCONTEXTCONFIG 64 2079 + MIPS KVM_REG_MIPS_CP0_PAGEMASK 32 2080 + MIPS KVM_REG_MIPS_CP0_PAGEGRAIN 32 2081 + MIPS KVM_REG_MIPS_CP0_SEGCTL0 64 2082 + MIPS KVM_REG_MIPS_CP0_SEGCTL1 64 2083 + MIPS KVM_REG_MIPS_CP0_SEGCTL2 64 2084 + MIPS KVM_REG_MIPS_CP0_PWBASE 64 2085 + MIPS KVM_REG_MIPS_CP0_PWFIELD 64 2086 + MIPS KVM_REG_MIPS_CP0_PWSIZE 64 2087 + MIPS KVM_REG_MIPS_CP0_WIRED 32 2088 + MIPS KVM_REG_MIPS_CP0_PWCTL 32 2089 + MIPS KVM_REG_MIPS_CP0_HWRENA 32 2090 + MIPS KVM_REG_MIPS_CP0_BADVADDR 64 2091 + MIPS KVM_REG_MIPS_CP0_BADINSTR 32 2092 + MIPS KVM_REG_MIPS_CP0_BADINSTRP 32 2093 + MIPS KVM_REG_MIPS_CP0_COUNT 32 2094 + MIPS KVM_REG_MIPS_CP0_ENTRYHI 64 2095 + MIPS KVM_REG_MIPS_CP0_COMPARE 32 2096 + MIPS KVM_REG_MIPS_CP0_STATUS 32 2097 + MIPS KVM_REG_MIPS_CP0_INTCTL 32 2098 + MIPS KVM_REG_MIPS_CP0_CAUSE 32 2099 + MIPS KVM_REG_MIPS_CP0_EPC 64 2100 + MIPS KVM_REG_MIPS_CP0_PRID 32 2101 + MIPS KVM_REG_MIPS_CP0_EBASE 64 2102 + MIPS KVM_REG_MIPS_CP0_CONFIG 32 2103 + MIPS KVM_REG_MIPS_CP0_CONFIG1 32 2104 + MIPS KVM_REG_MIPS_CP0_CONFIG2 32 2105 + MIPS KVM_REG_MIPS_CP0_CONFIG3 32 2106 + MIPS KVM_REG_MIPS_CP0_CONFIG4 32 2107 + MIPS KVM_REG_MIPS_CP0_CONFIG5 32 2108 + MIPS KVM_REG_MIPS_CP0_CONFIG7 32 2109 + MIPS KVM_REG_MIPS_CP0_XCONTEXT 64 2110 + MIPS KVM_REG_MIPS_CP0_ERROREPC 64 2111 + MIPS KVM_REG_MIPS_CP0_KSCRATCH1 64 2112 + MIPS KVM_REG_MIPS_CP0_KSCRATCH2 64 2113 + MIPS KVM_REG_MIPS_CP0_KSCRATCH3 64 2114 + MIPS KVM_REG_MIPS_CP0_KSCRATCH4 64 2115 + MIPS KVM_REG_MIPS_CP0_KSCRATCH5 64 2116 + MIPS KVM_REG_MIPS_CP0_KSCRATCH6 64 2117 + MIPS KVM_REG_MIPS_CP0_MAAR(0..63) 64 2118 + MIPS KVM_REG_MIPS_COUNT_CTL 64 2119 + MIPS KVM_REG_MIPS_COUNT_RESUME 64 2120 + MIPS KVM_REG_MIPS_COUNT_HZ 64 2121 + MIPS KVM_REG_MIPS_FPR_32(0..31) 32 2122 + MIPS KVM_REG_MIPS_FPR_64(0..31) 64 2123 + MIPS KVM_REG_MIPS_VEC_128(0..31) 128 2124 + MIPS KVM_REG_MIPS_FCR_IR 32 2125 + MIPS KVM_REG_MIPS_FCR_CSR 32 2126 + MIPS KVM_REG_MIPS_MSA_IR 32 2127 + MIPS KVM_REG_MIPS_MSA_CSR 32 2128 + ======= =============================== ============ 2330 2129 2331 2130 ARM registers are mapped using the lower 32 bits. The upper 16 of that 2332 2131 is the register group type, or coprocessor number: 2333 2132 2334 - ARM core registers have the following id bit patterns: 2133 + ARM core registers have the following id bit patterns:: 2134 + 2335 2135 0x4020 0000 0010 <index into the kvm_regs struct:16> 2336 2136 2337 - ARM 32-bit CP15 registers have the following id bit patterns: 2137 + ARM 32-bit CP15 registers have the following id bit patterns:: 2138 + 2338 2139 0x4020 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3> 2339 2140 2340 - ARM 64-bit CP15 registers have the following id bit patterns: 2141 + ARM 64-bit CP15 registers have the following id bit patterns:: 2142 + 2341 2143 0x4030 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3> 2342 2144 2343 - ARM CCSIDR registers are demultiplexed by CSSELR value: 2145 + ARM CCSIDR registers are demultiplexed by CSSELR value:: 2146 + 2344 2147 0x4020 0000 0011 00 <csselr:8> 2345 2148 2346 - ARM 32-bit VFP control registers have the following id bit patterns: 2149 + ARM 32-bit VFP control registers have the following id bit patterns:: 2150 + 2347 2151 0x4020 0000 0012 1 <regno:12> 2348 2152 2349 - ARM 64-bit FP registers have the following id bit patterns: 2153 + ARM 64-bit FP registers have the following id bit patterns:: 2154 + 2350 2155 0x4030 0000 0012 0 <regno:12> 2351 2156 2352 - ARM firmware pseudo-registers have the following bit pattern: 2157 + ARM firmware pseudo-registers have the following bit pattern:: 2158 + 2353 2159 0x4030 0000 0014 <regno:16> 2354 2160 2355 2161 ··· 2368 2156 arm64 core/FP-SIMD registers have the following id bit patterns. Note 2369 2157 that the size of the access is variable, as the kvm_regs structure 2370 2158 contains elements ranging from 32 to 128 bits. The index is a 32bit 2371 - value in the kvm_regs structure seen as a 32bit array. 2159 + value in the kvm_regs structure seen as a 32bit array:: 2160 + 2372 2161 0x60x0 0000 0010 <index into the kvm_regs struct:16> 2373 2162 2374 2163 Specifically: 2164 + 2165 + ======================= ========= ===== ======================================= 2375 2166 Encoding Register Bits kvm_regs member 2376 - ---------------------------------------------------------------- 2167 + ======================= ========= ===== ======================================= 2377 2168 0x6030 0000 0010 0000 X0 64 regs.regs[0] 2378 2169 0x6030 0000 0010 0002 X1 64 regs.regs[1] 2379 - ... 2170 + ... 2380 2171 0x6030 0000 0010 003c X30 64 regs.regs[30] 2381 2172 0x6030 0000 0010 003e SP 64 regs.sp 2382 2173 0x6030 0000 0010 0040 PC 64 regs.pc ··· 2391 2176 0x6030 0000 0010 004c SPSR_UND 64 spsr[KVM_SPSR_UND] 2392 2177 0x6030 0000 0010 004e SPSR_IRQ 64 spsr[KVM_SPSR_IRQ] 2393 2178 0x6060 0000 0010 0050 SPSR_FIQ 64 spsr[KVM_SPSR_FIQ] 2394 - 0x6040 0000 0010 0054 V0 128 fp_regs.vregs[0] (*) 2395 - 0x6040 0000 0010 0058 V1 128 fp_regs.vregs[1] (*) 2396 - ... 2397 - 0x6040 0000 0010 00d0 V31 128 fp_regs.vregs[31] (*) 2179 + 0x6040 0000 0010 0054 V0 128 fp_regs.vregs[0] [1]_ 2180 + 0x6040 0000 0010 0058 V1 128 fp_regs.vregs[1] [1]_ 2181 + ... 2182 + 0x6040 0000 0010 00d0 V31 128 fp_regs.vregs[31] [1]_ 2398 2183 0x6020 0000 0010 00d4 FPSR 32 fp_regs.fpsr 2399 2184 0x6020 0000 0010 00d5 FPCR 32 fp_regs.fpcr 2185 + ======================= ========= ===== ======================================= 2400 2186 2401 - (*) These encodings are not accepted for SVE-enabled vcpus. See 2402 - KVM_ARM_VCPU_INIT. 2187 + .. [1] These encodings are not accepted for SVE-enabled vcpus. See 2188 + KVM_ARM_VCPU_INIT. 2403 2189 2404 - The equivalent register content can be accessed via bits [127:0] of 2405 - the corresponding SVE Zn registers instead for vcpus that have SVE 2406 - enabled (see below). 2190 + The equivalent register content can be accessed via bits [127:0] of 2191 + the corresponding SVE Zn registers instead for vcpus that have SVE 2192 + enabled (see below). 2407 2193 2408 - arm64 CCSIDR registers are demultiplexed by CSSELR value: 2194 + arm64 CCSIDR registers are demultiplexed by CSSELR value:: 2195 + 2409 2196 0x6020 0000 0011 00 <csselr:8> 2410 2197 2411 - arm64 system registers have the following id bit patterns: 2198 + arm64 system registers have the following id bit patterns:: 2199 + 2412 2200 0x6030 0000 0013 <op0:2> <op1:3> <crn:4> <crm:4> <op2:3> 2413 2201 2414 - WARNING: 2202 + .. warning:: 2203 + 2415 2204 Two system register IDs do not follow the specified pattern. These 2416 2205 are KVM_REG_ARM_TIMER_CVAL and KVM_REG_ARM_TIMER_CNT, which map to 2417 2206 system registers CNTV_CVAL_EL0 and CNTVCT_EL0 respectively. These ··· 2424 2205 derived from the register encoding for CNTV_CVAL_EL0. As this is 2425 2206 API, it must remain this way. 2426 2207 2427 - arm64 firmware pseudo-registers have the following bit pattern: 2208 + arm64 firmware pseudo-registers have the following bit pattern:: 2209 + 2428 2210 0x6030 0000 0014 <regno:16> 2429 2211 2430 - arm64 SVE registers have the following bit patterns: 2212 + arm64 SVE registers have the following bit patterns:: 2213 + 2431 2214 0x6080 0000 0015 00 <n:5> <slice:5> Zn bits[2048*slice + 2047 : 2048*slice] 2432 2215 0x6050 0000 0015 04 <n:4> <slice:5> Pn bits[256*slice + 255 : 256*slice] 2433 2216 0x6050 0000 0015 060 <slice:5> FFR bits[256*slice + 255 : 256*slice] ··· 2437 2216 2438 2217 Access to register IDs where 2048 * slice >= 128 * max_vq will fail with 2439 2218 ENOENT. max_vq is the vcpu's maximum supported vector length in 128-bit 2440 - quadwords: see (**) below. 2219 + quadwords: see [2]_ below. 2441 2220 2442 2221 These registers are only accessible on vcpus for which SVE is enabled. 2443 2222 See KVM_ARM_VCPU_INIT for details. ··· 2452 2231 userspace. When transferred to or from user memory via KVM_GET_ONE_REG 2453 2232 or KVM_SET_ONE_REG, the value of this register is of type 2454 2233 __u64[KVM_ARM64_SVE_VLS_WORDS], and encodes the set of vector lengths as 2455 - follows: 2234 + follows:: 2456 2235 2457 - __u64 vector_lengths[KVM_ARM64_SVE_VLS_WORDS]; 2236 + __u64 vector_lengths[KVM_ARM64_SVE_VLS_WORDS]; 2458 2237 2459 - if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX && 2460 - ((vector_lengths[(vq - KVM_ARM64_SVE_VQ_MIN) / 64] >> 2238 + if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX && 2239 + ((vector_lengths[(vq - KVM_ARM64_SVE_VQ_MIN) / 64] >> 2461 2240 ((vq - KVM_ARM64_SVE_VQ_MIN) % 64)) & 1)) 2462 2241 /* Vector length vq * 16 bytes supported */ 2463 - else 2242 + else 2464 2243 /* Vector length vq * 16 bytes not supported */ 2465 2244 2466 - (**) The maximum value vq for which the above condition is true is 2467 - max_vq. This is the maximum vector length available to the guest on 2468 - this vcpu, and determines which register slices are visible through 2469 - this ioctl interface. 2245 + .. [2] The maximum value vq for which the above condition is true is 2246 + max_vq. This is the maximum vector length available to the guest on 2247 + this vcpu, and determines which register slices are visible through 2248 + this ioctl interface. 2470 2249 2471 2250 (See Documentation/arm64/sve.rst for an explanation of the "vq" 2472 2251 nomenclature.) ··· 2491 2270 MIPS registers are mapped using the lower 32 bits. The upper 16 of that is 2492 2271 the register group type: 2493 2272 2494 - MIPS core registers (see above) have the following id bit patterns: 2273 + MIPS core registers (see above) have the following id bit patterns:: 2274 + 2495 2275 0x7030 0000 0000 <reg:16> 2496 2276 2497 2277 MIPS CP0 registers (see KVM_REG_MIPS_CP0_* above) have the following id bit 2498 - patterns depending on whether they're 32-bit or 64-bit registers: 2278 + patterns depending on whether they're 32-bit or 64-bit registers:: 2279 + 2499 2280 0x7020 0000 0001 00 <reg:5> <sel:3> (32-bit) 2500 2281 0x7030 0000 0001 00 <reg:5> <sel:3> (64-bit) 2501 2282 ··· 2508 2285 the PFNX field starting at bit 30. 2509 2286 2510 2287 MIPS MAARs (see KVM_REG_MIPS_CP0_MAAR(*) above) have the following id bit 2511 - patterns: 2288 + patterns:: 2289 + 2512 2290 0x7030 0000 0001 01 <reg:8> 2513 2291 2514 - MIPS KVM control registers (see above) have the following id bit patterns: 2292 + MIPS KVM control registers (see above) have the following id bit patterns:: 2293 + 2515 2294 0x7030 0000 0002 <reg:16> 2516 2295 2517 2296 MIPS FPU registers (see KVM_REG_MIPS_FPR_{32,64}() above) have the following ··· 2522 2297 Config5.FRE), i.e. as the guest would see them, and they become unpredictable 2523 2298 if the guest FPU mode is changed. MIPS SIMD Architecture (MSA) vector 2524 2299 registers (see KVM_REG_MIPS_VEC_128() above) have similar patterns as they 2525 - overlap the FPU registers: 2300 + overlap the FPU registers:: 2301 + 2526 2302 0x7020 0000 0003 00 <0:3> <reg:5> (32-bit FPU registers) 2527 2303 0x7030 0000 0003 00 <0:3> <reg:5> (64-bit FPU registers) 2528 2304 0x7040 0000 0003 00 <0:3> <reg:5> (128-bit MSA vector registers) 2529 2305 2530 2306 MIPS FPU control registers (see KVM_REG_MIPS_FCR_{IR,CSR} above) have the 2531 - following id bit patterns: 2307 + following id bit patterns:: 2308 + 2532 2309 0x7020 0000 0003 01 <0:3> <reg:5> 2533 2310 2534 2311 MIPS MSA control registers (see KVM_REG_MIPS_MSA_{IR,CSR} above) have the 2535 - following id bit patterns: 2312 + following id bit patterns:: 2313 + 2536 2314 0x7020 0000 0003 02 <0:3> <reg:5> 2537 2315 2538 2316 2539 2317 4.69 KVM_GET_ONE_REG 2318 + -------------------- 2540 2319 2541 - Capability: KVM_CAP_ONE_REG 2542 - Architectures: all 2543 - Type: vcpu ioctl 2544 - Parameters: struct kvm_one_reg (in and out) 2545 - Returns: 0 on success, negative value on failure 2320 + :Capability: KVM_CAP_ONE_REG 2321 + :Architectures: all 2322 + :Type: vcpu ioctl 2323 + :Parameters: struct kvm_one_reg (in and out) 2324 + :Returns: 0 on success, negative value on failure 2325 + 2546 2326 Errors include: 2547 - ENOENT: no such register 2548 - EINVAL: invalid register ID, or no such register 2549 - EPERM: (arm64) register access not allowed before vcpu finalization 2327 + 2328 + ======== ============================================================ 2329 + ENOENT no such register 2330 + EINVAL invalid register ID, or no such register 2331 + EPERM (arm64) register access not allowed before vcpu finalization 2332 + ======== ============================================================ 2333 + 2550 2334 (These error codes are indicative only: do not rely on a specific error 2551 2335 code being returned in a specific situation.) 2552 2336 ··· 2569 2335 2570 2336 2571 2337 4.70 KVM_KVMCLOCK_CTRL 2338 + ---------------------- 2572 2339 2573 - Capability: KVM_CAP_KVMCLOCK_CTRL 2574 - Architectures: Any that implement pvclocks (currently x86 only) 2575 - Type: vcpu ioctl 2576 - Parameters: None 2577 - Returns: 0 on success, -1 on error 2340 + :Capability: KVM_CAP_KVMCLOCK_CTRL 2341 + :Architectures: Any that implement pvclocks (currently x86 only) 2342 + :Type: vcpu ioctl 2343 + :Parameters: None 2344 + :Returns: 0 on success, -1 on error 2578 2345 2579 2346 This signals to the host kernel that the specified guest is being paused by 2580 2347 userspace. The host will set a flag in the pvclock structure that is checked ··· 2591 2356 2592 2357 2593 2358 4.71 KVM_SIGNAL_MSI 2359 + ------------------- 2594 2360 2595 - Capability: KVM_CAP_SIGNAL_MSI 2596 - Architectures: x86 arm arm64 2597 - Type: vm ioctl 2598 - Parameters: struct kvm_msi (in) 2599 - Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error 2361 + :Capability: KVM_CAP_SIGNAL_MSI 2362 + :Architectures: x86 arm arm64 2363 + :Type: vm ioctl 2364 + :Parameters: struct kvm_msi (in) 2365 + :Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error 2600 2366 2601 2367 Directly inject a MSI message. Only valid with in-kernel irqchip that handles 2602 2368 MSI messages. 2603 2369 2604 - struct kvm_msi { 2370 + :: 2371 + 2372 + struct kvm_msi { 2605 2373 __u32 address_lo; 2606 2374 __u32 address_hi; 2607 2375 __u32 data; 2608 2376 __u32 flags; 2609 2377 __u32 devid; 2610 2378 __u8 pad[12]; 2611 - }; 2379 + }; 2612 2380 2613 - flags: KVM_MSI_VALID_DEVID: devid contains a valid value. The per-VM 2381 + flags: 2382 + KVM_MSI_VALID_DEVID: devid contains a valid value. The per-VM 2614 2383 KVM_CAP_MSI_DEVID capability advertises the requirement to provide 2615 2384 the device ID. If this capability is not available, userspace 2616 2385 should never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail. ··· 2630 2391 2631 2392 2632 2393 4.71 KVM_CREATE_PIT2 2394 + -------------------- 2633 2395 2634 - Capability: KVM_CAP_PIT2 2635 - Architectures: x86 2636 - Type: vm ioctl 2637 - Parameters: struct kvm_pit_config (in) 2638 - Returns: 0 on success, -1 on error 2396 + :Capability: KVM_CAP_PIT2 2397 + :Architectures: x86 2398 + :Type: vm ioctl 2399 + :Parameters: struct kvm_pit_config (in) 2400 + :Returns: 0 on success, -1 on error 2639 2401 2640 2402 Creates an in-kernel device model for the i8254 PIT. This call is only valid 2641 2403 after enabling in-kernel irqchip support via KVM_CREATE_IRQCHIP. The following 2642 - parameters have to be passed: 2404 + parameters have to be passed:: 2643 2405 2644 - struct kvm_pit_config { 2406 + struct kvm_pit_config { 2645 2407 __u32 flags; 2646 2408 __u32 pad[15]; 2647 - }; 2409 + }; 2648 2410 2649 - Valid flags are: 2411 + Valid flags are:: 2650 2412 2651 - #define KVM_PIT_SPEAKER_DUMMY 1 /* emulate speaker port stub */ 2413 + #define KVM_PIT_SPEAKER_DUMMY 1 /* emulate speaker port stub */ 2652 2414 2653 2415 PIT timer interrupts may use a per-VM kernel thread for injection. If it 2654 - exists, this thread will have a name of the following pattern: 2416 + exists, this thread will have a name of the following pattern:: 2655 2417 2656 - kvm-pit/<owner-process-pid> 2418 + kvm-pit/<owner-process-pid> 2657 2419 2658 2420 When running a guest with elevated priorities, the scheduling parameters of 2659 2421 this thread may have to be adjusted accordingly. ··· 2663 2423 2664 2424 2665 2425 4.72 KVM_GET_PIT2 2426 + ----------------- 2666 2427 2667 - Capability: KVM_CAP_PIT_STATE2 2668 - Architectures: x86 2669 - Type: vm ioctl 2670 - Parameters: struct kvm_pit_state2 (out) 2671 - Returns: 0 on success, -1 on error 2428 + :Capability: KVM_CAP_PIT_STATE2 2429 + :Architectures: x86 2430 + :Type: vm ioctl 2431 + :Parameters: struct kvm_pit_state2 (out) 2432 + :Returns: 0 on success, -1 on error 2672 2433 2673 2434 Retrieves the state of the in-kernel PIT model. Only valid after 2674 - KVM_CREATE_PIT2. The state is returned in the following structure: 2435 + KVM_CREATE_PIT2. The state is returned in the following structure:: 2675 2436 2676 - struct kvm_pit_state2 { 2437 + struct kvm_pit_state2 { 2677 2438 struct kvm_pit_channel_state channels[3]; 2678 2439 __u32 flags; 2679 2440 __u32 reserved[9]; 2680 - }; 2441 + }; 2681 2442 2682 - Valid flags are: 2443 + Valid flags are:: 2683 2444 2684 - /* disable PIT in HPET legacy mode */ 2685 - #define KVM_PIT_FLAGS_HPET_LEGACY 0x00000001 2445 + /* disable PIT in HPET legacy mode */ 2446 + #define KVM_PIT_FLAGS_HPET_LEGACY 0x00000001 2686 2447 2687 2448 This IOCTL replaces the obsolete KVM_GET_PIT. 2688 2449 2689 2450 2690 2451 4.73 KVM_SET_PIT2 2452 + ----------------- 2691 2453 2692 - Capability: KVM_CAP_PIT_STATE2 2693 - Architectures: x86 2694 - Type: vm ioctl 2695 - Parameters: struct kvm_pit_state2 (in) 2696 - Returns: 0 on success, -1 on error 2454 + :Capability: KVM_CAP_PIT_STATE2 2455 + :Architectures: x86 2456 + :Type: vm ioctl 2457 + :Parameters: struct kvm_pit_state2 (in) 2458 + :Returns: 0 on success, -1 on error 2697 2459 2698 2460 Sets the state of the in-kernel PIT model. Only valid after KVM_CREATE_PIT2. 2699 2461 See KVM_GET_PIT2 for details on struct kvm_pit_state2. ··· 2704 2462 2705 2463 2706 2464 4.74 KVM_PPC_GET_SMMU_INFO 2465 + -------------------------- 2707 2466 2708 - Capability: KVM_CAP_PPC_GET_SMMU_INFO 2709 - Architectures: powerpc 2710 - Type: vm ioctl 2711 - Parameters: None 2712 - Returns: 0 on success, -1 on error 2467 + :Capability: KVM_CAP_PPC_GET_SMMU_INFO 2468 + :Architectures: powerpc 2469 + :Type: vm ioctl 2470 + :Parameters: None 2471 + :Returns: 0 on success, -1 on error 2713 2472 2714 2473 This populates and returns a structure describing the features of 2715 2474 the "Server" class MMU emulation supported by KVM. ··· 2718 2475 device-tree properties for the guest operating system. 2719 2476 2720 2477 The structure contains some global information, followed by an 2721 - array of supported segment page sizes: 2478 + array of supported segment page sizes:: 2722 2479 2723 2480 struct kvm_ppc_smmu_info { 2724 2481 __u64 flags; ··· 2746 2503 2747 2504 The "sps" array contains 8 entries indicating the supported base 2748 2505 page sizes for a segment in increasing order. Each entry is defined 2749 - as follow: 2506 + as follow:: 2750 2507 2751 2508 struct kvm_ppc_one_seg_page_size { 2752 2509 __u32 page_shift; /* Base page shift of segment (or 0) */ ··· 2767 2524 only larger or equal to the base page size), along with the 2768 2525 corresponding encoding in the hash PTE. Similarly, the array is 2769 2526 8 entries sorted by increasing sizes and an entry with a "0" shift 2770 - is an empty entry and a terminator: 2527 + is an empty entry and a terminator:: 2771 2528 2772 2529 struct kvm_ppc_one_page_size { 2773 2530 __u32 page_shift; /* Page shift (or 0) */ ··· 2779 2536 into the hash PTE second double word). 2780 2537 2781 2538 4.75 KVM_IRQFD 2539 + -------------- 2782 2540 2783 - Capability: KVM_CAP_IRQFD 2784 - Architectures: x86 s390 arm arm64 2785 - Type: vm ioctl 2786 - Parameters: struct kvm_irqfd (in) 2787 - Returns: 0 on success, -1 on error 2541 + :Capability: KVM_CAP_IRQFD 2542 + :Architectures: x86 s390 arm arm64 2543 + :Type: vm ioctl 2544 + :Parameters: struct kvm_irqfd (in) 2545 + :Returns: 0 on success, -1 on error 2788 2546 2789 2547 Allows setting an eventfd to directly trigger a guest interrupt. 2790 2548 kvm_irqfd.fd specifies the file descriptor to use as the eventfd and ··· 2809 2565 and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. 2810 2566 2811 2567 On arm/arm64, gsi routing being supported, the following can happen: 2568 + 2812 2569 - in case no routing entry is associated to this gsi, injection fails 2813 2570 - in case the gsi is associated to an irqchip routing entry, 2814 2571 irqchip.pin + 32 corresponds to the injected SPI ID. ··· 2818 2573 to GICv3 ITS in-kernel emulation). 2819 2574 2820 2575 4.76 KVM_PPC_ALLOCATE_HTAB 2576 + -------------------------- 2821 2577 2822 - Capability: KVM_CAP_PPC_ALLOC_HTAB 2823 - Architectures: powerpc 2824 - Type: vm ioctl 2825 - Parameters: Pointer to u32 containing hash table order (in/out) 2826 - Returns: 0 on success, -1 on error 2578 + :Capability: KVM_CAP_PPC_ALLOC_HTAB 2579 + :Architectures: powerpc 2580 + :Type: vm ioctl 2581 + :Parameters: Pointer to u32 containing hash table order (in/out) 2582 + :Returns: 0 on success, -1 on error 2827 2583 2828 2584 This requests the host kernel to allocate an MMU hash table for a 2829 2585 guest using the PAPR paravirtualization interface. This only does ··· 2855 2609 HPTEs on the next KVM_RUN of any vcpu. 2856 2610 2857 2611 4.77 KVM_S390_INTERRUPT 2612 + ----------------------- 2858 2613 2859 - Capability: basic 2860 - Architectures: s390 2861 - Type: vm ioctl, vcpu ioctl 2862 - Parameters: struct kvm_s390_interrupt (in) 2863 - Returns: 0 on success, -1 on error 2614 + :Capability: basic 2615 + :Architectures: s390 2616 + :Type: vm ioctl, vcpu ioctl 2617 + :Parameters: struct kvm_s390_interrupt (in) 2618 + :Returns: 0 on success, -1 on error 2864 2619 2865 2620 Allows to inject an interrupt to the guest. Interrupts can be floating 2866 2621 (vm ioctl) or per cpu (vcpu ioctl), depending on the interrupt type. 2867 2622 2868 - Interrupt parameters are passed via kvm_s390_interrupt: 2623 + Interrupt parameters are passed via kvm_s390_interrupt:: 2869 2624 2870 - struct kvm_s390_interrupt { 2625 + struct kvm_s390_interrupt { 2871 2626 __u32 type; 2872 2627 __u32 parm; 2873 2628 __u64 parm64; 2874 - }; 2629 + }; 2875 2630 2876 2631 type can be one of the following: 2877 2632 2878 - KVM_S390_SIGP_STOP (vcpu) - sigp stop; optional flags in parm 2879 - KVM_S390_PROGRAM_INT (vcpu) - program check; code in parm 2880 - KVM_S390_SIGP_SET_PREFIX (vcpu) - sigp set prefix; prefix address in parm 2881 - KVM_S390_RESTART (vcpu) - restart 2882 - KVM_S390_INT_CLOCK_COMP (vcpu) - clock comparator interrupt 2883 - KVM_S390_INT_CPU_TIMER (vcpu) - CPU timer interrupt 2884 - KVM_S390_INT_VIRTIO (vm) - virtio external interrupt; external interrupt 2885 - parameters in parm and parm64 2886 - KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm 2887 - KVM_S390_INT_EMERGENCY (vcpu) - sigp emergency; source cpu in parm 2888 - KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm 2889 - KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) - compound value to indicate an 2890 - I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel); 2891 - I/O interruption parameters in parm (subchannel) and parm64 (intparm, 2892 - interruption subclass) 2893 - KVM_S390_MCHK (vm, vcpu) - machine check interrupt; cr 14 bits in parm, 2894 - machine check interrupt code in parm64 (note that 2895 - machine checks needing further payload are not 2896 - supported by this ioctl) 2633 + KVM_S390_SIGP_STOP (vcpu) 2634 + - sigp stop; optional flags in parm 2635 + KVM_S390_PROGRAM_INT (vcpu) 2636 + - program check; code in parm 2637 + KVM_S390_SIGP_SET_PREFIX (vcpu) 2638 + - sigp set prefix; prefix address in parm 2639 + KVM_S390_RESTART (vcpu) 2640 + - restart 2641 + KVM_S390_INT_CLOCK_COMP (vcpu) 2642 + - clock comparator interrupt 2643 + KVM_S390_INT_CPU_TIMER (vcpu) 2644 + - CPU timer interrupt 2645 + KVM_S390_INT_VIRTIO (vm) 2646 + - virtio external interrupt; external interrupt 2647 + parameters in parm and parm64 2648 + KVM_S390_INT_SERVICE (vm) 2649 + - sclp external interrupt; sclp parameter in parm 2650 + KVM_S390_INT_EMERGENCY (vcpu) 2651 + - sigp emergency; source cpu in parm 2652 + KVM_S390_INT_EXTERNAL_CALL (vcpu) 2653 + - sigp external call; source cpu in parm 2654 + KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) 2655 + - compound value to indicate an 2656 + I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel); 2657 + I/O interruption parameters in parm (subchannel) and parm64 (intparm, 2658 + interruption subclass) 2659 + KVM_S390_MCHK (vm, vcpu) 2660 + - machine check interrupt; cr 14 bits in parm, machine check interrupt 2661 + code in parm64 (note that machine checks needing further payload are not 2662 + supported by this ioctl) 2897 2663 2898 2664 This is an asynchronous vcpu ioctl and can be invoked from any thread. 2899 2665 2900 2666 4.78 KVM_PPC_GET_HTAB_FD 2667 + ------------------------ 2901 2668 2902 - Capability: KVM_CAP_PPC_HTAB_FD 2903 - Architectures: powerpc 2904 - Type: vm ioctl 2905 - Parameters: Pointer to struct kvm_get_htab_fd (in) 2906 - Returns: file descriptor number (>= 0) on success, -1 on error 2669 + :Capability: KVM_CAP_PPC_HTAB_FD 2670 + :Architectures: powerpc 2671 + :Type: vm ioctl 2672 + :Parameters: Pointer to struct kvm_get_htab_fd (in) 2673 + :Returns: file descriptor number (>= 0) on success, -1 on error 2907 2674 2908 2675 This returns a file descriptor that can be used either to read out the 2909 2676 entries in the guest's hashed page table (HPT), or to write entries to 2910 2677 initialize the HPT. The returned fd can only be written to if the 2911 2678 KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and 2912 2679 can only be read if that bit is clear. The argument struct looks like 2913 - this: 2680 + this:: 2914 2681 2915 - /* For KVM_PPC_GET_HTAB_FD */ 2916 - struct kvm_get_htab_fd { 2682 + /* For KVM_PPC_GET_HTAB_FD */ 2683 + struct kvm_get_htab_fd { 2917 2684 __u64 flags; 2918 2685 __u64 start_index; 2919 2686 __u64 reserved[2]; 2920 - }; 2687 + }; 2921 2688 2922 - /* Values for kvm_get_htab_fd.flags */ 2923 - #define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1) 2924 - #define KVM_GET_HTAB_WRITE ((__u64)0x2) 2689 + /* Values for kvm_get_htab_fd.flags */ 2690 + #define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1) 2691 + #define KVM_GET_HTAB_WRITE ((__u64)0x2) 2925 2692 2926 - The `start_index' field gives the index in the HPT of the entry at 2693 + The 'start_index' field gives the index in the HPT of the entry at 2927 2694 which to start reading. It is ignored when writing. 2928 2695 2929 2696 Reads on the fd will initially supply information about all ··· 2951 2692 series of valid HPT entries (16 bytes) each. The header indicates how 2952 2693 many valid HPT entries there are and how many invalid entries follow 2953 2694 the valid entries. The invalid entries are not represented explicitly 2954 - in the stream. The header format is: 2695 + in the stream. The header format is:: 2955 2696 2956 - struct kvm_get_htab_header { 2697 + struct kvm_get_htab_header { 2957 2698 __u32 index; 2958 2699 __u16 n_valid; 2959 2700 __u16 n_invalid; 2960 - }; 2701 + }; 2961 2702 2962 2703 Writes to the fd create HPT entries starting at the index given in the 2963 - header; first `n_valid' valid entries with contents from the data 2964 - written, then `n_invalid' invalid entries, invalidating any previously 2704 + header; first 'n_valid' valid entries with contents from the data 2705 + written, then 'n_invalid' invalid entries, invalidating any previously 2965 2706 valid entries found. 2966 2707 2967 2708 4.79 KVM_CREATE_DEVICE 2709 + ---------------------- 2968 2710 2969 - Capability: KVM_CAP_DEVICE_CTRL 2970 - Type: vm ioctl 2971 - Parameters: struct kvm_create_device (in/out) 2972 - Returns: 0 on success, -1 on error 2711 + :Capability: KVM_CAP_DEVICE_CTRL 2712 + :Type: vm ioctl 2713 + :Parameters: struct kvm_create_device (in/out) 2714 + :Returns: 0 on success, -1 on error 2715 + 2973 2716 Errors: 2974 - ENODEV: The device type is unknown or unsupported 2975 - EEXIST: Device already created, and this type of device may not 2717 + 2718 + ====== ======================================================= 2719 + ENODEV The device type is unknown or unsupported 2720 + EEXIST Device already created, and this type of device may not 2976 2721 be instantiated multiple times 2722 + ====== ======================================================= 2977 2723 2978 2724 Other error conditions may be defined by individual device types or 2979 2725 have their standard meanings. ··· 2994 2730 for specifying any behavior that is not implied by the device type 2995 2731 number. 2996 2732 2997 - struct kvm_create_device { 2733 + :: 2734 + 2735 + struct kvm_create_device { 2998 2736 __u32 type; /* in: KVM_DEV_TYPE_xxx */ 2999 2737 __u32 fd; /* out: device handle */ 3000 2738 __u32 flags; /* in: KVM_CREATE_DEVICE_xxx */ 3001 - }; 2739 + }; 3002 2740 3003 2741 4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR 2742 + -------------------------------------------- 3004 2743 3005 - Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device, 3006 - KVM_CAP_VCPU_ATTRIBUTES for vcpu device 3007 - Type: device ioctl, vm ioctl, vcpu ioctl 3008 - Parameters: struct kvm_device_attr 3009 - Returns: 0 on success, -1 on error 2744 + :Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device, 2745 + KVM_CAP_VCPU_ATTRIBUTES for vcpu device 2746 + :Type: device ioctl, vm ioctl, vcpu ioctl 2747 + :Parameters: struct kvm_device_attr 2748 + :Returns: 0 on success, -1 on error 2749 + 3010 2750 Errors: 3011 - ENXIO: The group or attribute is unknown/unsupported for this device 2751 + 2752 + ===== ============================================================= 2753 + ENXIO The group or attribute is unknown/unsupported for this device 3012 2754 or hardware support is missing. 3013 - EPERM: The attribute cannot (currently) be accessed this way 2755 + EPERM The attribute cannot (currently) be accessed this way 3014 2756 (e.g. read-only attribute, or attribute that only makes 3015 2757 sense when the device is in a different state) 2758 + ===== ============================================================= 3016 2759 3017 2760 Other error conditions may be defined by individual device types. 3018 2761 ··· 3028 2757 the "devices" directory. As with ONE_REG, the size of the data 3029 2758 transferred is defined by the particular attribute. 3030 2759 3031 - struct kvm_device_attr { 2760 + :: 2761 + 2762 + struct kvm_device_attr { 3032 2763 __u32 flags; /* no flags currently defined */ 3033 2764 __u32 group; /* device-defined */ 3034 2765 __u64 attr; /* group-defined */ 3035 2766 __u64 addr; /* userspace address of attr data */ 3036 - }; 2767 + }; 3037 2768 3038 2769 4.81 KVM_HAS_DEVICE_ATTR 2770 + ------------------------ 3039 2771 3040 - Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device, 3041 - KVM_CAP_VCPU_ATTRIBUTES for vcpu device 3042 - Type: device ioctl, vm ioctl, vcpu ioctl 3043 - Parameters: struct kvm_device_attr 3044 - Returns: 0 on success, -1 on error 2772 + :Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device, 2773 + KVM_CAP_VCPU_ATTRIBUTES for vcpu device 2774 + :Type: device ioctl, vm ioctl, vcpu ioctl 2775 + :Parameters: struct kvm_device_attr 2776 + :Returns: 0 on success, -1 on error 2777 + 3045 2778 Errors: 3046 - ENXIO: The group or attribute is unknown/unsupported for this device 2779 + 2780 + ===== ============================================================= 2781 + ENXIO The group or attribute is unknown/unsupported for this device 3047 2782 or hardware support is missing. 2783 + ===== ============================================================= 3048 2784 3049 2785 Tests whether a device supports a particular attribute. A successful 3050 2786 return indicates the attribute is implemented. It does not necessarily ··· 3059 2781 current state. "addr" is ignored. 3060 2782 3061 2783 4.82 KVM_ARM_VCPU_INIT 2784 + ---------------------- 3062 2785 3063 - Capability: basic 3064 - Architectures: arm, arm64 3065 - Type: vcpu ioctl 3066 - Parameters: struct kvm_vcpu_init (in) 3067 - Returns: 0 on success; -1 on error 2786 + :Capability: basic 2787 + :Architectures: arm, arm64 2788 + :Type: vcpu ioctl 2789 + :Parameters: struct kvm_vcpu_init (in) 2790 + :Returns: 0 on success; -1 on error 2791 + 3068 2792 Errors: 3069 - EINVAL: the target is unknown, or the combination of features is invalid. 3070 - ENOENT: a features bit specified is unknown. 2793 + 2794 + ====== ================================================================= 2795 + EINVAL the target is unknown, or the combination of features is invalid. 2796 + ENOENT a features bit specified is unknown. 2797 + ====== ================================================================= 3071 2798 3072 2799 This tells KVM what type of CPU to present to the guest, and what 3073 2800 optional features it should have. This will cause a reset of the cpu ··· 3088 2805 target and same set of feature flags, otherwise EINVAL will be returned. 3089 2806 3090 2807 Possible features: 2808 + 3091 2809 - KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state. 3092 2810 Depends on KVM_CAP_ARM_PSCI. If not set, the CPU will be powered on 3093 2811 and execute guest code when KVM_RUN is called. ··· 3145 2861 no longer be written using KVM_SET_ONE_REG. 3146 2862 3147 2863 4.83 KVM_ARM_PREFERRED_TARGET 2864 + ----------------------------- 3148 2865 3149 - Capability: basic 3150 - Architectures: arm, arm64 3151 - Type: vm ioctl 3152 - Parameters: struct struct kvm_vcpu_init (out) 3153 - Returns: 0 on success; -1 on error 2866 + :Capability: basic 2867 + :Architectures: arm, arm64 2868 + :Type: vm ioctl 2869 + :Parameters: struct struct kvm_vcpu_init (out) 2870 + :Returns: 0 on success; -1 on error 2871 + 3154 2872 Errors: 3155 - ENODEV: no preferred target available for the host 2873 + 2874 + ====== ========================================== 2875 + ENODEV no preferred target available for the host 2876 + ====== ========================================== 3156 2877 3157 2878 This queries KVM for preferred CPU target type which can be emulated 3158 2879 by KVM on underlying host. ··· 3174 2885 3175 2886 3176 2887 4.84 KVM_GET_REG_LIST 2888 + --------------------- 3177 2889 3178 - Capability: basic 3179 - Architectures: arm, arm64, mips 3180 - Type: vcpu ioctl 3181 - Parameters: struct kvm_reg_list (in/out) 3182 - Returns: 0 on success; -1 on error 2890 + :Capability: basic 2891 + :Architectures: arm, arm64, mips 2892 + :Type: vcpu ioctl 2893 + :Parameters: struct kvm_reg_list (in/out) 2894 + :Returns: 0 on success; -1 on error 2895 + 3183 2896 Errors: 3184 - E2BIG: the reg index list is too big to fit in the array specified by 3185 - the user (the number required will be written into n). 3186 2897 3187 - struct kvm_reg_list { 2898 + ===== ============================================================== 2899 + E2BIG the reg index list is too big to fit in the array specified by 2900 + the user (the number required will be written into n). 2901 + ===== ============================================================== 2902 + 2903 + :: 2904 + 2905 + struct kvm_reg_list { 3188 2906 __u64 n; /* number of registers in reg[] */ 3189 2907 __u64 reg[0]; 3190 - }; 2908 + }; 3191 2909 3192 2910 This ioctl returns the guest registers that are supported for the 3193 2911 KVM_GET_ONE_REG/KVM_SET_ONE_REG calls. 3194 2912 3195 2913 3196 2914 4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated) 2915 + ----------------------------------------- 3197 2916 3198 - Capability: KVM_CAP_ARM_SET_DEVICE_ADDR 3199 - Architectures: arm, arm64 3200 - Type: vm ioctl 3201 - Parameters: struct kvm_arm_device_address (in) 3202 - Returns: 0 on success, -1 on error 2917 + :Capability: KVM_CAP_ARM_SET_DEVICE_ADDR 2918 + :Architectures: arm, arm64 2919 + :Type: vm ioctl 2920 + :Parameters: struct kvm_arm_device_address (in) 2921 + :Returns: 0 on success, -1 on error 2922 + 3203 2923 Errors: 3204 - ENODEV: The device id is unknown 3205 - ENXIO: Device not supported on current system 3206 - EEXIST: Address already set 3207 - E2BIG: Address outside guest physical address space 3208 - EBUSY: Address overlaps with other device range 3209 2924 3210 - struct kvm_arm_device_addr { 2925 + ====== ============================================ 2926 + ENODEV The device id is unknown 2927 + ENXIO Device not supported on current system 2928 + EEXIST Address already set 2929 + E2BIG Address outside guest physical address space 2930 + EBUSY Address overlaps with other device range 2931 + ====== ============================================ 2932 + 2933 + :: 2934 + 2935 + struct kvm_arm_device_addr { 3211 2936 __u64 id; 3212 2937 __u64 addr; 3213 - }; 2938 + }; 3214 2939 3215 2940 Specify a device address in the guest's physical address space where guests 3216 2941 can access emulated or directly exposed devices, which the host kernel needs ··· 3232 2929 specific device. 3233 2930 3234 2931 ARM/arm64 divides the id field into two parts, a device id and an 3235 - address type id specific to the individual device. 2932 + address type id specific to the individual device:: 3236 2933 3237 2934 bits: | 63 ... 32 | 31 ... 16 | 15 ... 0 | 3238 2935 field: | 0x00000000 | device id | addr type id | ··· 3250 2947 3251 2948 3252 2949 4.86 KVM_PPC_RTAS_DEFINE_TOKEN 2950 + ------------------------------ 3253 2951 3254 - Capability: KVM_CAP_PPC_RTAS 3255 - Architectures: ppc 3256 - Type: vm ioctl 3257 - Parameters: struct kvm_rtas_token_args 3258 - Returns: 0 on success, -1 on error 2952 + :Capability: KVM_CAP_PPC_RTAS 2953 + :Architectures: ppc 2954 + :Type: vm ioctl 2955 + :Parameters: struct kvm_rtas_token_args 2956 + :Returns: 0 on success, -1 on error 3259 2957 3260 2958 Defines a token value for a RTAS (Run Time Abstraction Services) 3261 2959 service in order to allow it to be handled in the kernel. The ··· 3270 2966 handled. 3271 2967 3272 2968 4.87 KVM_SET_GUEST_DEBUG 2969 + ------------------------ 3273 2970 3274 - Capability: KVM_CAP_SET_GUEST_DEBUG 3275 - Architectures: x86, s390, ppc, arm64 3276 - Type: vcpu ioctl 3277 - Parameters: struct kvm_guest_debug (in) 3278 - Returns: 0 on success; -1 on error 2971 + :Capability: KVM_CAP_SET_GUEST_DEBUG 2972 + :Architectures: x86, s390, ppc, arm64 2973 + :Type: vcpu ioctl 2974 + :Parameters: struct kvm_guest_debug (in) 2975 + :Returns: 0 on success; -1 on error 3279 2976 3280 - struct kvm_guest_debug { 2977 + :: 2978 + 2979 + struct kvm_guest_debug { 3281 2980 __u32 control; 3282 2981 __u32 pad; 3283 2982 struct kvm_guest_debug_arch arch; 3284 - }; 2983 + }; 3285 2984 3286 2985 Set up the processor specific debug registers and configure vcpu for 3287 2986 handling guest debug events. There are two parts to the structure, the ··· 3326 3019 structure containing architecture specific debug information. 3327 3020 3328 3021 4.88 KVM_GET_EMULATED_CPUID 3022 + --------------------------- 3329 3023 3330 - Capability: KVM_CAP_EXT_EMUL_CPUID 3331 - Architectures: x86 3332 - Type: system ioctl 3333 - Parameters: struct kvm_cpuid2 (in/out) 3334 - Returns: 0 on success, -1 on error 3024 + :Capability: KVM_CAP_EXT_EMUL_CPUID 3025 + :Architectures: x86 3026 + :Type: system ioctl 3027 + :Parameters: struct kvm_cpuid2 (in/out) 3028 + :Returns: 0 on success, -1 on error 3335 3029 3336 - struct kvm_cpuid2 { 3030 + :: 3031 + 3032 + struct kvm_cpuid2 { 3337 3033 __u32 nent; 3338 3034 __u32 flags; 3339 3035 struct kvm_cpuid_entry2 entries[0]; 3340 - }; 3036 + }; 3341 3037 3342 3038 The member 'flags' is used for passing flags from userspace. 3343 3039 3344 - #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0) 3345 - #define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) 3346 - #define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) 3040 + :: 3347 3041 3348 - struct kvm_cpuid_entry2 { 3042 + #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0) 3043 + #define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) 3044 + #define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) 3045 + 3046 + struct kvm_cpuid_entry2 { 3349 3047 __u32 function; 3350 3048 __u32 index; 3351 3049 __u32 flags; ··· 3359 3047 __u32 ecx; 3360 3048 __u32 edx; 3361 3049 __u32 padding[3]; 3362 - }; 3050 + }; 3363 3051 3364 3052 This ioctl returns x86 cpuid features which are emulated by 3365 3053 kvm.Userspace can use the information returned by this ioctl to query ··· 3384 3072 3385 3073 The fields in each entry are defined as follows: 3386 3074 3387 - function: the eax value used to obtain the entry 3388 - index: the ecx value used to obtain the entry (for entries that are 3075 + function: 3076 + the eax value used to obtain the entry 3077 + index: 3078 + the ecx value used to obtain the entry (for entries that are 3389 3079 affected by ecx) 3390 - flags: an OR of zero or more of the following: 3080 + flags: 3081 + an OR of zero or more of the following: 3082 + 3391 3083 KVM_CPUID_FLAG_SIGNIFCANT_INDEX: 3392 3084 if the index field is valid 3393 3085 KVM_CPUID_FLAG_STATEFUL_FUNC: ··· 3401 3085 KVM_CPUID_FLAG_STATE_READ_NEXT: 3402 3086 for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is 3403 3087 the first entry to be read by a cpu 3404 - eax, ebx, ecx, edx: the values returned by the cpuid instruction for 3088 + 3089 + eax, ebx, ecx, edx: 3090 + 3091 + the values returned by the cpuid instruction for 3405 3092 this function/index combination 3406 3093 3407 3094 4.89 KVM_S390_MEM_OP 3095 + -------------------- 3408 3096 3409 - Capability: KVM_CAP_S390_MEM_OP 3410 - Architectures: s390 3411 - Type: vcpu ioctl 3412 - Parameters: struct kvm_s390_mem_op (in) 3413 - Returns: = 0 on success, 3414 - < 0 on generic error (e.g. -EFAULT or -ENOMEM), 3415 - > 0 if an exception occurred while walking the page tables 3097 + :Capability: KVM_CAP_S390_MEM_OP 3098 + :Architectures: s390 3099 + :Type: vcpu ioctl 3100 + :Parameters: struct kvm_s390_mem_op (in) 3101 + :Returns: = 0 on success, 3102 + < 0 on generic error (e.g. -EFAULT or -ENOMEM), 3103 + > 0 if an exception occurred while walking the page tables 3416 3104 3417 3105 Read or write data from/to the logical (virtual) memory of a VCPU. 3418 3106 3419 - Parameters are specified via the following structure: 3107 + Parameters are specified via the following structure:: 3420 3108 3421 - struct kvm_s390_mem_op { 3109 + struct kvm_s390_mem_op { 3422 3110 __u64 gaddr; /* the guest address */ 3423 3111 __u64 flags; /* flags */ 3424 3112 __u32 size; /* amount of bytes */ ··· 3430 3110 __u64 buf; /* buffer in userspace */ 3431 3111 __u8 ar; /* the access register number */ 3432 3112 __u8 reserved[31]; /* should be set to 0 */ 3433 - }; 3113 + }; 3434 3114 3435 3115 The type of operation is specified in the "op" field. It is either 3436 3116 KVM_S390_MEMOP_LOGICAL_READ for reading from logical memory space or ··· 3457 3137 KVM with the currently defined set of flags. 3458 3138 3459 3139 4.90 KVM_S390_GET_SKEYS 3140 + ----------------------- 3460 3141 3461 - Capability: KVM_CAP_S390_SKEYS 3462 - Architectures: s390 3463 - Type: vm ioctl 3464 - Parameters: struct kvm_s390_skeys 3465 - Returns: 0 on success, KVM_S390_GET_KEYS_NONE if guest is not using storage 3466 - keys, negative value on error 3142 + :Capability: KVM_CAP_S390_SKEYS 3143 + :Architectures: s390 3144 + :Type: vm ioctl 3145 + :Parameters: struct kvm_s390_skeys 3146 + :Returns: 0 on success, KVM_S390_GET_KEYS_NONE if guest is not using storage 3147 + keys, negative value on error 3467 3148 3468 3149 This ioctl is used to get guest storage key values on the s390 3469 - architecture. The ioctl takes parameters via the kvm_s390_skeys struct. 3150 + architecture. The ioctl takes parameters via the kvm_s390_skeys struct:: 3470 3151 3471 - struct kvm_s390_skeys { 3152 + struct kvm_s390_skeys { 3472 3153 __u64 start_gfn; 3473 3154 __u64 count; 3474 3155 __u64 skeydata_addr; 3475 3156 __u32 flags; 3476 3157 __u32 reserved[9]; 3477 - }; 3158 + }; 3478 3159 3479 3160 The start_gfn field is the number of the first guest frame whose storage keys 3480 3161 you want to get. ··· 3489 3168 bytes. This buffer will be filled with storage key data by the ioctl. 3490 3169 3491 3170 4.91 KVM_S390_SET_SKEYS 3171 + ----------------------- 3492 3172 3493 - Capability: KVM_CAP_S390_SKEYS 3494 - Architectures: s390 3495 - Type: vm ioctl 3496 - Parameters: struct kvm_s390_skeys 3497 - Returns: 0 on success, negative value on error 3173 + :Capability: KVM_CAP_S390_SKEYS 3174 + :Architectures: s390 3175 + :Type: vm ioctl 3176 + :Parameters: struct kvm_s390_skeys 3177 + :Returns: 0 on success, negative value on error 3498 3178 3499 3179 This ioctl is used to set guest storage key values on the s390 3500 3180 architecture. The ioctl takes parameters via the kvm_s390_skeys struct. ··· 3517 3195 the ioctl will return -EINVAL. 3518 3196 3519 3197 4.92 KVM_S390_IRQ 3198 + ----------------- 3520 3199 3521 - Capability: KVM_CAP_S390_INJECT_IRQ 3522 - Architectures: s390 3523 - Type: vcpu ioctl 3524 - Parameters: struct kvm_s390_irq (in) 3525 - Returns: 0 on success, -1 on error 3200 + :Capability: KVM_CAP_S390_INJECT_IRQ 3201 + :Architectures: s390 3202 + :Type: vcpu ioctl 3203 + :Parameters: struct kvm_s390_irq (in) 3204 + :Returns: 0 on success, -1 on error 3205 + 3526 3206 Errors: 3527 - EINVAL: interrupt type is invalid 3528 - type is KVM_S390_SIGP_STOP and flag parameter is invalid value 3207 + 3208 + 3209 + ====== ================================================================= 3210 + EINVAL interrupt type is invalid 3211 + type is KVM_S390_SIGP_STOP and flag parameter is invalid value, 3529 3212 type is KVM_S390_INT_EXTERNAL_CALL and code is bigger 3530 - than the maximum of VCPUs 3531 - EBUSY: type is KVM_S390_SIGP_SET_PREFIX and vcpu is not stopped 3532 - type is KVM_S390_SIGP_STOP and a stop irq is already pending 3213 + than the maximum of VCPUs 3214 + EBUSY type is KVM_S390_SIGP_SET_PREFIX and vcpu is not stopped, 3215 + type is KVM_S390_SIGP_STOP and a stop irq is already pending, 3533 3216 type is KVM_S390_INT_EXTERNAL_CALL and an external call interrupt 3534 - is already pending 3217 + is already pending 3218 + ====== ================================================================= 3535 3219 3536 3220 Allows to inject an interrupt to the guest. 3537 3221 ··· 3545 3217 to inject additional payload which is not 3546 3218 possible via KVM_S390_INTERRUPT. 3547 3219 3548 - Interrupt parameters are passed via kvm_s390_irq: 3220 + Interrupt parameters are passed via kvm_s390_irq:: 3549 3221 3550 - struct kvm_s390_irq { 3222 + struct kvm_s390_irq { 3551 3223 __u64 type; 3552 3224 union { 3553 3225 struct kvm_s390_io_info io; ··· 3560 3232 struct kvm_s390_mchk_info mchk; 3561 3233 char reserved[64]; 3562 3234 } u; 3563 - }; 3235 + }; 3564 3236 3565 3237 type can be one of the following: 3566 3238 3567 - KVM_S390_SIGP_STOP - sigp stop; parameter in .stop 3568 - KVM_S390_PROGRAM_INT - program check; parameters in .pgm 3569 - KVM_S390_SIGP_SET_PREFIX - sigp set prefix; parameters in .prefix 3570 - KVM_S390_RESTART - restart; no parameters 3571 - KVM_S390_INT_CLOCK_COMP - clock comparator interrupt; no parameters 3572 - KVM_S390_INT_CPU_TIMER - CPU timer interrupt; no parameters 3573 - KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg 3574 - KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall 3575 - KVM_S390_MCHK - machine check interrupt; parameters in .mchk 3239 + - KVM_S390_SIGP_STOP - sigp stop; parameter in .stop 3240 + - KVM_S390_PROGRAM_INT - program check; parameters in .pgm 3241 + - KVM_S390_SIGP_SET_PREFIX - sigp set prefix; parameters in .prefix 3242 + - KVM_S390_RESTART - restart; no parameters 3243 + - KVM_S390_INT_CLOCK_COMP - clock comparator interrupt; no parameters 3244 + - KVM_S390_INT_CPU_TIMER - CPU timer interrupt; no parameters 3245 + - KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg 3246 + - KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall 3247 + - KVM_S390_MCHK - machine check interrupt; parameters in .mchk 3576 3248 3577 3249 This is an asynchronous vcpu ioctl and can be invoked from any thread. 3578 3250 3579 3251 4.94 KVM_S390_GET_IRQ_STATE 3252 + --------------------------- 3580 3253 3581 - Capability: KVM_CAP_S390_IRQ_STATE 3582 - Architectures: s390 3583 - Type: vcpu ioctl 3584 - Parameters: struct kvm_s390_irq_state (out) 3585 - Returns: >= number of bytes copied into buffer, 3586 - -EINVAL if buffer size is 0, 3587 - -ENOBUFS if buffer size is too small to fit all pending interrupts, 3588 - -EFAULT if the buffer address was invalid 3254 + :Capability: KVM_CAP_S390_IRQ_STATE 3255 + :Architectures: s390 3256 + :Type: vcpu ioctl 3257 + :Parameters: struct kvm_s390_irq_state (out) 3258 + :Returns: >= number of bytes copied into buffer, 3259 + -EINVAL if buffer size is 0, 3260 + -ENOBUFS if buffer size is too small to fit all pending interrupts, 3261 + -EFAULT if the buffer address was invalid 3589 3262 3590 3263 This ioctl allows userspace to retrieve the complete state of all currently 3591 3264 pending interrupts in a single buffer. Use cases include migration 3592 3265 and introspection. The parameter structure contains the address of a 3593 - userspace buffer and its length: 3266 + userspace buffer and its length:: 3594 3267 3595 - struct kvm_s390_irq_state { 3268 + struct kvm_s390_irq_state { 3596 3269 __u64 buf; 3597 3270 __u32 flags; /* will stay unused for compatibility reasons */ 3598 3271 __u32 len; 3599 3272 __u32 reserved[4]; /* will stay unused for compatibility reasons */ 3600 - }; 3273 + }; 3601 3274 3602 3275 Userspace passes in the above struct and for each pending interrupt a 3603 3276 struct kvm_s390_irq is copied to the provided buffer. ··· 3612 3283 may retry with a bigger buffer. 3613 3284 3614 3285 4.95 KVM_S390_SET_IRQ_STATE 3286 + --------------------------- 3615 3287 3616 - Capability: KVM_CAP_S390_IRQ_STATE 3617 - Architectures: s390 3618 - Type: vcpu ioctl 3619 - Parameters: struct kvm_s390_irq_state (in) 3620 - Returns: 0 on success, 3621 - -EFAULT if the buffer address was invalid, 3622 - -EINVAL for an invalid buffer length (see below), 3623 - -EBUSY if there were already interrupts pending, 3624 - errors occurring when actually injecting the 3288 + :Capability: KVM_CAP_S390_IRQ_STATE 3289 + :Architectures: s390 3290 + :Type: vcpu ioctl 3291 + :Parameters: struct kvm_s390_irq_state (in) 3292 + :Returns: 0 on success, 3293 + -EFAULT if the buffer address was invalid, 3294 + -EINVAL for an invalid buffer length (see below), 3295 + -EBUSY if there were already interrupts pending, 3296 + errors occurring when actually injecting the 3625 3297 interrupt. See KVM_S390_IRQ. 3626 3298 3627 3299 This ioctl allows userspace to set the complete state of all cpu-local 3628 3300 interrupts currently pending for the vcpu. It is intended for restoring 3629 3301 interrupt state after a migration. The input parameter is a userspace buffer 3630 - containing a struct kvm_s390_irq_state: 3302 + containing a struct kvm_s390_irq_state:: 3631 3303 3632 - struct kvm_s390_irq_state { 3304 + struct kvm_s390_irq_state { 3633 3305 __u64 buf; 3634 3306 __u32 flags; /* will stay unused for compatibility reasons */ 3635 3307 __u32 len; 3636 3308 __u32 reserved[4]; /* will stay unused for compatibility reasons */ 3637 - }; 3309 + }; 3638 3310 3639 3311 The restrictions for flags and reserved apply as well. 3640 3312 (see KVM_S390_GET_IRQ_STATE) ··· 3650 3320 which is the maximum number of possibly pending cpu-local interrupts. 3651 3321 3652 3322 4.96 KVM_SMI 3323 + ------------ 3653 3324 3654 - Capability: KVM_CAP_X86_SMM 3655 - Architectures: x86 3656 - Type: vcpu ioctl 3657 - Parameters: none 3658 - Returns: 0 on success, -1 on error 3325 + :Capability: KVM_CAP_X86_SMM 3326 + :Architectures: x86 3327 + :Type: vcpu ioctl 3328 + :Parameters: none 3329 + :Returns: 0 on success, -1 on error 3659 3330 3660 3331 Queues an SMI on the thread's vcpu. 3661 3332 3662 3333 4.97 KVM_CAP_PPC_MULTITCE 3334 + ------------------------- 3663 3335 3664 - Capability: KVM_CAP_PPC_MULTITCE 3665 - Architectures: ppc 3666 - Type: vm 3336 + :Capability: KVM_CAP_PPC_MULTITCE 3337 + :Architectures: ppc 3338 + :Type: vm 3667 3339 3668 3340 This capability means the kernel is capable of handling hypercalls 3669 3341 H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user ··· 3687 3355 This capability is always enabled. 3688 3356 3689 3357 4.98 KVM_CREATE_SPAPR_TCE_64 3358 + ---------------------------- 3690 3359 3691 - Capability: KVM_CAP_SPAPR_TCE_64 3692 - Architectures: powerpc 3693 - Type: vm ioctl 3694 - Parameters: struct kvm_create_spapr_tce_64 (in) 3695 - Returns: file descriptor for manipulating the created TCE table 3360 + :Capability: KVM_CAP_SPAPR_TCE_64 3361 + :Architectures: powerpc 3362 + :Type: vm ioctl 3363 + :Parameters: struct kvm_create_spapr_tce_64 (in) 3364 + :Returns: file descriptor for manipulating the created TCE table 3696 3365 3697 3366 This is an extension for KVM_CAP_SPAPR_TCE which only supports 32bit 3698 3367 windows, described in 4.62 KVM_CREATE_SPAPR_TCE 3699 3368 3700 - This capability uses extended struct in ioctl interface: 3369 + This capability uses extended struct in ioctl interface:: 3701 3370 3702 - /* for KVM_CAP_SPAPR_TCE_64 */ 3703 - struct kvm_create_spapr_tce_64 { 3371 + /* for KVM_CAP_SPAPR_TCE_64 */ 3372 + struct kvm_create_spapr_tce_64 { 3704 3373 __u64 liobn; 3705 3374 __u32 page_shift; 3706 3375 __u32 flags; 3707 3376 __u64 offset; /* in pages */ 3708 3377 __u64 size; /* in pages */ 3709 - }; 3378 + }; 3710 3379 3711 3380 The aim of extension is to support an additional bigger DMA window with 3712 3381 a variable page size. ··· 3720 3387 The rest of functionality is identical to KVM_CREATE_SPAPR_TCE. 3721 3388 3722 3389 4.99 KVM_REINJECT_CONTROL 3390 + ------------------------- 3723 3391 3724 - Capability: KVM_CAP_REINJECT_CONTROL 3725 - Architectures: x86 3726 - Type: vm ioctl 3727 - Parameters: struct kvm_reinject_control (in) 3728 - Returns: 0 on success, 3392 + :Capability: KVM_CAP_REINJECT_CONTROL 3393 + :Architectures: x86 3394 + :Type: vm ioctl 3395 + :Parameters: struct kvm_reinject_control (in) 3396 + :Returns: 0 on success, 3729 3397 -EFAULT if struct kvm_reinject_control cannot be read, 3730 3398 -ENXIO if KVM_CREATE_PIT or KVM_CREATE_PIT2 didn't succeed earlier. 3731 3399 ··· 3736 3402 interrupt whenever there isn't a pending interrupt from i8254. 3737 3403 !reinject mode injects an interrupt as soon as a tick arrives. 3738 3404 3739 - struct kvm_reinject_control { 3405 + :: 3406 + 3407 + struct kvm_reinject_control { 3740 3408 __u8 pit_reinject; 3741 3409 __u8 reserved[31]; 3742 - }; 3410 + }; 3743 3411 3744 3412 pit_reinject = 0 (!reinject mode) is recommended, unless running an old 3745 3413 operating system that uses the PIT for timing (e.g. Linux 2.4.x). 3746 3414 3747 3415 4.100 KVM_PPC_CONFIGURE_V3_MMU 3416 + ------------------------------ 3748 3417 3749 - Capability: KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3 3750 - Architectures: ppc 3751 - Type: vm ioctl 3752 - Parameters: struct kvm_ppc_mmuv3_cfg (in) 3753 - Returns: 0 on success, 3418 + :Capability: KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3 3419 + :Architectures: ppc 3420 + :Type: vm ioctl 3421 + :Parameters: struct kvm_ppc_mmuv3_cfg (in) 3422 + :Returns: 0 on success, 3754 3423 -EFAULT if struct kvm_ppc_mmuv3_cfg cannot be read, 3755 3424 -EINVAL if the configuration is invalid 3756 3425 ··· 3761 3424 page table) translation, and sets the pointer to the process table for 3762 3425 the guest. 3763 3426 3764 - struct kvm_ppc_mmuv3_cfg { 3427 + :: 3428 + 3429 + struct kvm_ppc_mmuv3_cfg { 3765 3430 __u64 flags; 3766 3431 __u64 process_table; 3767 - }; 3432 + }; 3768 3433 3769 3434 There are two bits that can be set in flags; KVM_PPC_MMUV3_RADIX and 3770 3435 KVM_PPC_MMUV3_GTSE. KVM_PPC_MMUV3_RADIX, if set, configures the guest ··· 3781 3442 the Power ISA V3.00, Book III section 5.7.6.1. 3782 3443 3783 3444 4.101 KVM_PPC_GET_RMMU_INFO 3445 + --------------------------- 3784 3446 3785 - Capability: KVM_CAP_PPC_RADIX_MMU 3786 - Architectures: ppc 3787 - Type: vm ioctl 3788 - Parameters: struct kvm_ppc_rmmu_info (out) 3789 - Returns: 0 on success, 3447 + :Capability: KVM_CAP_PPC_RADIX_MMU 3448 + :Architectures: ppc 3449 + :Type: vm ioctl 3450 + :Parameters: struct kvm_ppc_rmmu_info (out) 3451 + :Returns: 0 on success, 3790 3452 -EFAULT if struct kvm_ppc_rmmu_info cannot be written, 3791 3453 -EINVAL if no useful information can be returned 3792 3454 ··· 3796 3456 page sizes to put in the "AP" (actual page size) field for the tlbie 3797 3457 (TLB invalidate entry) instruction. 3798 3458 3799 - struct kvm_ppc_rmmu_info { 3459 + :: 3460 + 3461 + struct kvm_ppc_rmmu_info { 3800 3462 struct kvm_ppc_radix_geom { 3801 3463 __u8 page_shift; 3802 3464 __u8 level_bits[4]; 3803 3465 __u8 pad[3]; 3804 3466 } geometries[8]; 3805 3467 __u32 ap_encodings[8]; 3806 - }; 3468 + }; 3807 3469 3808 3470 The geometries[] field gives up to 8 supported geometries for the 3809 3471 radix page table, in terms of the log base 2 of the smallest page ··· 3818 3476 base 2 of the page size in the bottom 6 bits. 3819 3477 3820 3478 4.102 KVM_PPC_RESIZE_HPT_PREPARE 3479 + -------------------------------- 3821 3480 3822 - Capability: KVM_CAP_SPAPR_RESIZE_HPT 3823 - Architectures: powerpc 3824 - Type: vm ioctl 3825 - Parameters: struct kvm_ppc_resize_hpt (in) 3826 - Returns: 0 on successful completion, 3481 + :Capability: KVM_CAP_SPAPR_RESIZE_HPT 3482 + :Architectures: powerpc 3483 + :Type: vm ioctl 3484 + :Parameters: struct kvm_ppc_resize_hpt (in) 3485 + :Returns: 0 on successful completion, 3827 3486 >0 if a new HPT is being prepared, the value is an estimated 3828 - number of milliseconds until preparation is complete 3487 + number of milliseconds until preparation is complete, 3829 3488 -EFAULT if struct kvm_reinject_control cannot be read, 3830 - -EINVAL if the supplied shift or flags are invalid 3831 - -ENOMEM if unable to allocate the new HPT 3832 - -ENOSPC if there was a hash collision when moving existing 3833 - HPT entries to the new HPT 3489 + -EINVAL if the supplied shift or flags are invalid, 3490 + -ENOMEM if unable to allocate the new HPT, 3491 + -ENOSPC if there was a hash collision 3492 + 3493 + :: 3494 + 3495 + struct kvm_ppc_rmmu_info { 3496 + struct kvm_ppc_radix_geom { 3497 + __u8 page_shift; 3498 + __u8 level_bits[4]; 3499 + __u8 pad[3]; 3500 + } geometries[8]; 3501 + __u32 ap_encodings[8]; 3502 + }; 3503 + 3504 + The geometries[] field gives up to 8 supported geometries for the 3505 + radix page table, in terms of the log base 2 of the smallest page 3506 + size, and the number of bits indexed at each level of the tree, from 3507 + the PTE level up to the PGD level in that order. Any unused entries 3508 + will have 0 in the page_shift field. 3509 + 3510 + The ap_encodings gives the supported page sizes and their AP field 3511 + encodings, encoded with the AP value in the top 3 bits and the log 3512 + base 2 of the page size in the bottom 6 bits. 3513 + 3514 + 4.102 KVM_PPC_RESIZE_HPT_PREPARE 3515 + -------------------------------- 3516 + 3517 + :Capability: KVM_CAP_SPAPR_RESIZE_HPT 3518 + :Architectures: powerpc 3519 + :Type: vm ioctl 3520 + :Parameters: struct kvm_ppc_resize_hpt (in) 3521 + :Returns: 0 on successful completion, 3522 + >0 if a new HPT is being prepared, the value is an estimated 3523 + number of milliseconds until preparation is complete, 3524 + -EFAULT if struct kvm_reinject_control cannot be read, 3525 + -EINVAL if the supplied shift or flags are invalid,when moving existing 3526 + HPT entries to the new HPT, 3834 3527 -EIO on other error conditions 3835 3528 3836 3529 Used to implement the PAPR extension for runtime resizing of a guest's ··· 3883 3506 creates a new one as above. 3884 3507 3885 3508 If called when there is a pending HPT of the size requested, will: 3509 + 3886 3510 * If preparation of the pending HPT is already complete, return 0 3887 3511 * If preparation of the pending HPT has failed, return an error 3888 3512 code, then discard the pending HPT. ··· 3900 3522 it returns <= 0. The first call will initiate preparation, subsequent 3901 3523 ones will monitor preparation until it completes or fails. 3902 3524 3903 - struct kvm_ppc_resize_hpt { 3525 + :: 3526 + 3527 + struct kvm_ppc_resize_hpt { 3904 3528 __u64 flags; 3905 3529 __u32 shift; 3906 3530 __u32 pad; 3907 - }; 3531 + }; 3908 3532 3909 3533 4.103 KVM_PPC_RESIZE_HPT_COMMIT 3534 + ------------------------------- 3910 3535 3911 - Capability: KVM_CAP_SPAPR_RESIZE_HPT 3912 - Architectures: powerpc 3913 - Type: vm ioctl 3914 - Parameters: struct kvm_ppc_resize_hpt (in) 3915 - Returns: 0 on successful completion, 3536 + :Capability: KVM_CAP_SPAPR_RESIZE_HPT 3537 + :Architectures: powerpc 3538 + :Type: vm ioctl 3539 + :Parameters: struct kvm_ppc_resize_hpt (in) 3540 + :Returns: 0 on successful completion, 3916 3541 -EFAULT if struct kvm_reinject_control cannot be read, 3917 - -EINVAL if the supplied shift or flags are invalid 3542 + -EINVAL if the supplied shift or flags are invalid, 3918 3543 -ENXIO is there is no pending HPT, or the pending HPT doesn't 3919 - have the requested size 3920 - -EBUSY if the pending HPT is not fully prepared 3544 + have the requested size, 3545 + -EBUSY if the pending HPT is not fully prepared, 3921 3546 -ENOSPC if there was a hash collision when moving existing 3922 - HPT entries to the new HPT 3547 + HPT entries to the new HPT, 3923 3548 -EIO on other error conditions 3924 3549 3925 3550 Used to implement the PAPR extension for runtime resizing of a guest's ··· 3945 3564 3946 3565 On failure, the guest will still be operating on its previous HPT. 3947 3566 3948 - struct kvm_ppc_resize_hpt { 3567 + :: 3568 + 3569 + struct kvm_ppc_resize_hpt { 3949 3570 __u64 flags; 3950 3571 __u32 shift; 3951 3572 __u32 pad; 3952 - }; 3573 + }; 3953 3574 3954 3575 4.104 KVM_X86_GET_MCE_CAP_SUPPORTED 3576 + ----------------------------------- 3955 3577 3956 - Capability: KVM_CAP_MCE 3957 - Architectures: x86 3958 - Type: system ioctl 3959 - Parameters: u64 mce_cap (out) 3960 - Returns: 0 on success, -1 on error 3578 + :Capability: KVM_CAP_MCE 3579 + :Architectures: x86 3580 + :Type: system ioctl 3581 + :Parameters: u64 mce_cap (out) 3582 + :Returns: 0 on success, -1 on error 3961 3583 3962 3584 Returns supported MCE capabilities. The u64 mce_cap parameter 3963 3585 has the same format as the MSR_IA32_MCG_CAP register. Supported 3964 3586 capabilities will have the corresponding bits set. 3965 3587 3966 3588 4.105 KVM_X86_SETUP_MCE 3589 + ----------------------- 3967 3590 3968 - Capability: KVM_CAP_MCE 3969 - Architectures: x86 3970 - Type: vcpu ioctl 3971 - Parameters: u64 mcg_cap (in) 3972 - Returns: 0 on success, 3591 + :Capability: KVM_CAP_MCE 3592 + :Architectures: x86 3593 + :Type: vcpu ioctl 3594 + :Parameters: u64 mcg_cap (in) 3595 + :Returns: 0 on success, 3973 3596 -EFAULT if u64 mcg_cap cannot be read, 3974 3597 -EINVAL if the requested number of banks is invalid, 3975 3598 -EINVAL if requested MCE capability is not supported. ··· 3986 3601 retrieved with KVM_X86_GET_MCE_CAP_SUPPORTED. 3987 3602 3988 3603 4.106 KVM_X86_SET_MCE 3604 + --------------------- 3989 3605 3990 - Capability: KVM_CAP_MCE 3991 - Architectures: x86 3992 - Type: vcpu ioctl 3993 - Parameters: struct kvm_x86_mce (in) 3994 - Returns: 0 on success, 3606 + :Capability: KVM_CAP_MCE 3607 + :Architectures: x86 3608 + :Type: vcpu ioctl 3609 + :Parameters: struct kvm_x86_mce (in) 3610 + :Returns: 0 on success, 3995 3611 -EFAULT if struct kvm_x86_mce cannot be read, 3996 3612 -EINVAL if the bank number is invalid, 3997 3613 -EINVAL if VAL bit is not set in status field. 3998 3614 3999 3615 Inject a machine check error (MCE) into the guest. The input 4000 - parameter is: 3616 + parameter is:: 4001 3617 4002 - struct kvm_x86_mce { 3618 + struct kvm_x86_mce { 4003 3619 __u64 status; 4004 3620 __u64 addr; 4005 3621 __u64 misc; ··· 4008 3622 __u8 bank; 4009 3623 __u8 pad1[7]; 4010 3624 __u64 pad2[3]; 4011 - }; 3625 + }; 4012 3626 4013 3627 If the MCE being reported is an uncorrected error, KVM will 4014 3628 inject it as an MCE exception into the guest. If the guest ··· 4020 3634 not holding a previously reported uncorrected error). 4021 3635 4022 3636 4.107 KVM_S390_GET_CMMA_BITS 3637 + ---------------------------- 4023 3638 4024 - Capability: KVM_CAP_S390_CMMA_MIGRATION 4025 - Architectures: s390 4026 - Type: vm ioctl 4027 - Parameters: struct kvm_s390_cmma_log (in, out) 4028 - Returns: 0 on success, a negative value on error 3639 + :Capability: KVM_CAP_S390_CMMA_MIGRATION 3640 + :Architectures: s390 3641 + :Type: vm ioctl 3642 + :Parameters: struct kvm_s390_cmma_log (in, out) 3643 + :Returns: 0 on success, a negative value on error 4029 3644 4030 3645 This ioctl is used to get the values of the CMMA bits on the s390 4031 3646 architecture. It is meant to be used in two scenarios: 3647 + 4032 3648 - During live migration to save the CMMA values. Live migration needs 4033 3649 to be enabled via the KVM_REQ_START_MIGRATION VM property. 4034 3650 - To non-destructively peek at the CMMA values, with the flag ··· 4040 3652 values are written to a buffer whose location is indicated via the "values" 4041 3653 member in the kvm_s390_cmma_log struct. The values in the input struct are 4042 3654 also updated as needed. 3655 + 4043 3656 Each CMMA value takes up one byte. 4044 3657 4045 - struct kvm_s390_cmma_log { 3658 + :: 3659 + 3660 + struct kvm_s390_cmma_log { 4046 3661 __u64 start_gfn; 4047 3662 __u32 count; 4048 3663 __u32 flags; ··· 4054 3663 __u64 mask; 4055 3664 }; 4056 3665 __u64 values; 4057 - }; 3666 + }; 4058 3667 4059 3668 start_gfn is the number of the first guest frame whose CMMA values are 4060 3669 to be retrieved, ··· 4115 3724 present for the addresses (e.g. when using hugepages). 4116 3725 4117 3726 4.108 KVM_S390_SET_CMMA_BITS 3727 + ---------------------------- 4118 3728 4119 - Capability: KVM_CAP_S390_CMMA_MIGRATION 4120 - Architectures: s390 4121 - Type: vm ioctl 4122 - Parameters: struct kvm_s390_cmma_log (in) 4123 - Returns: 0 on success, a negative value on error 3729 + :Capability: KVM_CAP_S390_CMMA_MIGRATION 3730 + :Architectures: s390 3731 + :Type: vm ioctl 3732 + :Parameters: struct kvm_s390_cmma_log (in) 3733 + :Returns: 0 on success, a negative value on error 4124 3734 4125 3735 This ioctl is used to set the values of the CMMA bits on the s390 4126 3736 architecture. It is meant to be used during live migration to restore ··· 4129 3737 The ioctl takes parameters via the kvm_s390_cmma_values struct. 4130 3738 Each CMMA value takes up one byte. 4131 3739 4132 - struct kvm_s390_cmma_log { 3740 + :: 3741 + 3742 + struct kvm_s390_cmma_log { 4133 3743 __u64 start_gfn; 4134 3744 __u32 count; 4135 3745 __u32 flags; 4136 3746 union { 4137 3747 __u64 remaining; 4138 3748 __u64 mask; 4139 - }; 3749 + }; 4140 3750 __u64 values; 4141 - }; 3751 + }; 4142 3752 4143 3753 start_gfn indicates the starting guest frame number, 4144 3754 ··· 4163 3769 hugepages). 4164 3770 4165 3771 4.109 KVM_PPC_GET_CPU_CHAR 3772 + -------------------------- 4166 3773 4167 - Capability: KVM_CAP_PPC_GET_CPU_CHAR 4168 - Architectures: powerpc 4169 - Type: vm ioctl 4170 - Parameters: struct kvm_ppc_cpu_char (out) 4171 - Returns: 0 on successful completion 3774 + :Capability: KVM_CAP_PPC_GET_CPU_CHAR 3775 + :Architectures: powerpc 3776 + :Type: vm ioctl 3777 + :Parameters: struct kvm_ppc_cpu_char (out) 3778 + :Returns: 0 on successful completion, 4172 3779 -EFAULT if struct kvm_ppc_cpu_char cannot be written 4173 3780 4174 3781 This ioctl gives userspace information about certain characteristics 4175 3782 of the CPU relating to speculative execution of instructions and 4176 3783 possible information leakage resulting from speculative execution (see 4177 3784 CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754). The information is 4178 - returned in struct kvm_ppc_cpu_char, which looks like this: 3785 + returned in struct kvm_ppc_cpu_char, which looks like this:: 4179 3786 4180 - struct kvm_ppc_cpu_char { 3787 + struct kvm_ppc_cpu_char { 4181 3788 __u64 character; /* characteristics of the CPU */ 4182 3789 __u64 behaviour; /* recommended software behaviour */ 4183 3790 __u64 character_mask; /* valid bits in character */ 4184 3791 __u64 behaviour_mask; /* valid bits in behaviour */ 4185 - }; 3792 + }; 4186 3793 4187 3794 For extensibility, the character_mask and behaviour_mask fields 4188 3795 indicate which bits of character and behaviour have been filled in by ··· 4210 3815 H_GET_CPU_CHARACTERISTICS hypercall. 4211 3816 4212 3817 4.110 KVM_MEMORY_ENCRYPT_OP 3818 + --------------------------- 4213 3819 4214 - Capability: basic 4215 - Architectures: x86 4216 - Type: system 4217 - Parameters: an opaque platform specific structure (in/out) 4218 - Returns: 0 on success; -1 on error 3820 + :Capability: basic 3821 + :Architectures: x86 3822 + :Type: system 3823 + :Parameters: an opaque platform specific structure (in/out) 3824 + :Returns: 0 on success; -1 on error 4219 3825 4220 3826 If the platform supports creating encrypted VMs then this ioctl can be used 4221 3827 for issuing platform-specific memory encryption commands to manage those ··· 4227 3831 Documentation/virt/kvm/amd-memory-encryption.rst. 4228 3832 4229 3833 4.111 KVM_MEMORY_ENCRYPT_REG_REGION 3834 + ----------------------------------- 4230 3835 4231 - Capability: basic 4232 - Architectures: x86 4233 - Type: system 4234 - Parameters: struct kvm_enc_region (in) 4235 - Returns: 0 on success; -1 on error 3836 + :Capability: basic 3837 + :Architectures: x86 3838 + :Type: system 3839 + :Parameters: struct kvm_enc_region (in) 3840 + :Returns: 0 on success; -1 on error 4236 3841 4237 3842 This ioctl can be used to register a guest memory region which may 4238 3843 contain encrypted data (e.g. guest RAM, SMRAM etc). ··· 4251 3854 memory region registered with the ioctl. 4252 3855 4253 3856 4.112 KVM_MEMORY_ENCRYPT_UNREG_REGION 3857 + ------------------------------------- 4254 3858 4255 - Capability: basic 4256 - Architectures: x86 4257 - Type: system 4258 - Parameters: struct kvm_enc_region (in) 4259 - Returns: 0 on success; -1 on error 3859 + :Capability: basic 3860 + :Architectures: x86 3861 + :Type: system 3862 + :Parameters: struct kvm_enc_region (in) 3863 + :Returns: 0 on success; -1 on error 4260 3864 4261 3865 This ioctl can be used to unregister the guest memory region registered 4262 3866 with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above. 4263 3867 4264 3868 4.113 KVM_HYPERV_EVENTFD 3869 + ------------------------ 4265 3870 4266 - Capability: KVM_CAP_HYPERV_EVENTFD 4267 - Architectures: x86 4268 - Type: vm ioctl 4269 - Parameters: struct kvm_hyperv_eventfd (in) 3871 + :Capability: KVM_CAP_HYPERV_EVENTFD 3872 + :Architectures: x86 3873 + :Type: vm ioctl 3874 + :Parameters: struct kvm_hyperv_eventfd (in) 4270 3875 4271 3876 This ioctl (un)registers an eventfd to receive notifications from the guest on 4272 3877 the specified Hyper-V connection id through the SIGNAL_EVENT hypercall, without 4273 3878 causing a user exit. SIGNAL_EVENT hypercall with non-zero event flag number 4274 3879 (bits 24-31) still triggers a KVM_EXIT_HYPERV_HCALL user exit. 4275 3880 4276 - struct kvm_hyperv_eventfd { 3881 + :: 3882 + 3883 + struct kvm_hyperv_eventfd { 4277 3884 __u32 conn_id; 4278 3885 __s32 fd; 4279 3886 __u32 flags; 4280 3887 __u32 padding[3]; 4281 - }; 3888 + }; 4282 3889 4283 - The conn_id field should fit within 24 bits: 3890 + The conn_id field should fit within 24 bits:: 4284 3891 4285 - #define KVM_HYPERV_CONN_ID_MASK 0x00ffffff 3892 + #define KVM_HYPERV_CONN_ID_MASK 0x00ffffff 4286 3893 4287 - The acceptable values for the flags field are: 3894 + The acceptable values for the flags field are:: 4288 3895 4289 - #define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0) 3896 + #define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0) 4290 3897 4291 - Returns: 0 on success, 4292 - -EINVAL if conn_id or flags is outside the allowed range 4293 - -ENOENT on deassign if the conn_id isn't registered 4294 - -EEXIST on assign if the conn_id is already registered 3898 + :Returns: 0 on success, 3899 + -EINVAL if conn_id or flags is outside the allowed range, 3900 + -ENOENT on deassign if the conn_id isn't registered, 3901 + -EEXIST on assign if the conn_id is already registered 4295 3902 4296 3903 4.114 KVM_GET_NESTED_STATE 3904 + -------------------------- 4297 3905 4298 - Capability: KVM_CAP_NESTED_STATE 4299 - Architectures: x86 4300 - Type: vcpu ioctl 4301 - Parameters: struct kvm_nested_state (in/out) 4302 - Returns: 0 on success, -1 on error 3906 + :Capability: KVM_CAP_NESTED_STATE 3907 + :Architectures: x86 3908 + :Type: vcpu ioctl 3909 + :Parameters: struct kvm_nested_state (in/out) 3910 + :Returns: 0 on success, -1 on error 3911 + 4303 3912 Errors: 4304 - E2BIG: the total state size exceeds the value of 'size' specified by 4305 - the user; the size required will be written into size. 4306 3913 4307 - struct kvm_nested_state { 3914 + ===== ============================================================= 3915 + E2BIG the total state size exceeds the value of 'size' specified by 3916 + the user; the size required will be written into size. 3917 + ===== ============================================================= 3918 + 3919 + :: 3920 + 3921 + struct kvm_nested_state { 4308 3922 __u16 flags; 4309 3923 __u16 format; 4310 3924 __u32 size; ··· 4332 3924 struct kvm_vmx_nested_state_data vmx[0]; 4333 3925 struct kvm_svm_nested_state_data svm[0]; 4334 3926 } data; 4335 - }; 3927 + }; 4336 3928 4337 - #define KVM_STATE_NESTED_GUEST_MODE 0x00000001 4338 - #define KVM_STATE_NESTED_RUN_PENDING 0x00000002 4339 - #define KVM_STATE_NESTED_EVMCS 0x00000004 3929 + #define KVM_STATE_NESTED_GUEST_MODE 0x00000001 3930 + #define KVM_STATE_NESTED_RUN_PENDING 0x00000002 3931 + #define KVM_STATE_NESTED_EVMCS 0x00000004 4340 3932 4341 - #define KVM_STATE_NESTED_FORMAT_VMX 0 4342 - #define KVM_STATE_NESTED_FORMAT_SVM 1 3933 + #define KVM_STATE_NESTED_FORMAT_VMX 0 3934 + #define KVM_STATE_NESTED_FORMAT_SVM 1 4343 3935 4344 - #define KVM_STATE_NESTED_VMX_VMCS_SIZE 0x1000 3936 + #define KVM_STATE_NESTED_VMX_VMCS_SIZE 0x1000 4345 3937 4346 - #define KVM_STATE_NESTED_VMX_SMM_GUEST_MODE 0x00000001 4347 - #define KVM_STATE_NESTED_VMX_SMM_VMXON 0x00000002 3938 + #define KVM_STATE_NESTED_VMX_SMM_GUEST_MODE 0x00000001 3939 + #define KVM_STATE_NESTED_VMX_SMM_VMXON 0x00000002 4348 3940 4349 - struct kvm_vmx_nested_state_hdr { 3941 + struct kvm_vmx_nested_state_hdr { 4350 3942 __u64 vmxon_pa; 4351 3943 __u64 vmcs12_pa; 4352 3944 4353 3945 struct { 4354 3946 __u16 flags; 4355 3947 } smm; 4356 - }; 3948 + }; 4357 3949 4358 - struct kvm_vmx_nested_state_data { 3950 + struct kvm_vmx_nested_state_data { 4359 3951 __u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE]; 4360 3952 __u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE]; 4361 - }; 3953 + }; 4362 3954 4363 3955 This ioctl copies the vcpu's nested virtualization state from the kernel to 4364 3956 userspace. ··· 4367 3959 to the KVM_CHECK_EXTENSION ioctl(). 4368 3960 4369 3961 4.115 KVM_SET_NESTED_STATE 3962 + -------------------------- 4370 3963 4371 - Capability: KVM_CAP_NESTED_STATE 4372 - Architectures: x86 4373 - Type: vcpu ioctl 4374 - Parameters: struct kvm_nested_state (in) 4375 - Returns: 0 on success, -1 on error 3964 + :Capability: KVM_CAP_NESTED_STATE 3965 + :Architectures: x86 3966 + :Type: vcpu ioctl 3967 + :Parameters: struct kvm_nested_state (in) 3968 + :Returns: 0 on success, -1 on error 4376 3969 4377 3970 This copies the vcpu's kvm_nested_state struct from userspace to the kernel. 4378 3971 For the definition of struct kvm_nested_state, see KVM_GET_NESTED_STATE. 4379 3972 4380 3973 4.116 KVM_(UN)REGISTER_COALESCED_MMIO 3974 + ------------------------------------- 4381 3975 4382 - Capability: KVM_CAP_COALESCED_MMIO (for coalesced mmio) 4383 - KVM_CAP_COALESCED_PIO (for coalesced pio) 4384 - Architectures: all 4385 - Type: vm ioctl 4386 - Parameters: struct kvm_coalesced_mmio_zone 4387 - Returns: 0 on success, < 0 on error 3976 + :Capability: KVM_CAP_COALESCED_MMIO (for coalesced mmio) 3977 + KVM_CAP_COALESCED_PIO (for coalesced pio) 3978 + :Architectures: all 3979 + :Type: vm ioctl 3980 + :Parameters: struct kvm_coalesced_mmio_zone 3981 + :Returns: 0 on success, < 0 on error 4388 3982 4389 3983 Coalesced I/O is a performance optimization that defers hardware 4390 3984 register write emulation so that userspace exits are avoided. It is ··· 4408 3998 to I/O ports. 4409 3999 4410 4000 4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl) 4001 + ------------------------------------ 4411 4002 4412 - Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 4413 - Architectures: x86, arm, arm64, mips 4414 - Type: vm ioctl 4415 - Parameters: struct kvm_dirty_log (in) 4416 - Returns: 0 on success, -1 on error 4003 + :Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 4004 + :Architectures: x86, arm, arm64, mips 4005 + :Type: vm ioctl 4006 + :Parameters: struct kvm_dirty_log (in) 4007 + :Returns: 0 on success, -1 on error 4417 4008 4418 - /* for KVM_CLEAR_DIRTY_LOG */ 4419 - struct kvm_clear_dirty_log { 4009 + :: 4010 + 4011 + /* for KVM_CLEAR_DIRTY_LOG */ 4012 + struct kvm_clear_dirty_log { 4420 4013 __u32 slot; 4421 4014 __u32 num_pages; 4422 4015 __u64 first_page; ··· 4427 4014 void __user *dirty_bitmap; /* one bit per page */ 4428 4015 __u64 padding; 4429 4016 }; 4430 - }; 4017 + }; 4431 4018 4432 4019 The ioctl clears the dirty status of pages in a memory slot, according to 4433 4020 the bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap ··· 4451 4038 that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is present. 4452 4039 4453 4040 4.118 KVM_GET_SUPPORTED_HV_CPUID 4041 + -------------------------------- 4454 4042 4455 - Capability: KVM_CAP_HYPERV_CPUID 4456 - Architectures: x86 4457 - Type: vcpu ioctl 4458 - Parameters: struct kvm_cpuid2 (in/out) 4459 - Returns: 0 on success, -1 on error 4043 + :Capability: KVM_CAP_HYPERV_CPUID 4044 + :Architectures: x86 4045 + :Type: vcpu ioctl 4046 + :Parameters: struct kvm_cpuid2 (in/out) 4047 + :Returns: 0 on success, -1 on error 4460 4048 4461 - struct kvm_cpuid2 { 4049 + :: 4050 + 4051 + struct kvm_cpuid2 { 4462 4052 __u32 nent; 4463 4053 __u32 padding; 4464 4054 struct kvm_cpuid_entry2 entries[0]; 4465 - }; 4055 + }; 4466 4056 4467 - struct kvm_cpuid_entry2 { 4057 + struct kvm_cpuid_entry2 { 4468 4058 __u32 function; 4469 4059 __u32 index; 4470 4060 __u32 flags; ··· 4476 4060 __u32 ecx; 4477 4061 __u32 edx; 4478 4062 __u32 padding[3]; 4479 - }; 4063 + }; 4480 4064 4481 4065 This ioctl returns x86 cpuid features leaves related to Hyper-V emulation in 4482 4066 KVM. Userspace can use the information returned by this ioctl to construct ··· 4489 4073 leaves (0x40000000, 0x40000001). 4490 4074 4491 4075 Currently, the following list of CPUID leaves are returned: 4492 - HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS 4493 - HYPERV_CPUID_INTERFACE 4494 - HYPERV_CPUID_VERSION 4495 - HYPERV_CPUID_FEATURES 4496 - HYPERV_CPUID_ENLIGHTMENT_INFO 4497 - HYPERV_CPUID_IMPLEMENT_LIMITS 4498 - HYPERV_CPUID_NESTED_FEATURES 4076 + - HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS 4077 + - HYPERV_CPUID_INTERFACE 4078 + - HYPERV_CPUID_VERSION 4079 + - HYPERV_CPUID_FEATURES 4080 + - HYPERV_CPUID_ENLIGHTMENT_INFO 4081 + - HYPERV_CPUID_IMPLEMENT_LIMITS 4082 + - HYPERV_CPUID_NESTED_FEATURES 4499 4083 4500 4084 HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was 4501 4085 enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS). ··· 4511 4095 userspace should not expect to get any particular value there. 4512 4096 4513 4097 4.119 KVM_ARM_VCPU_FINALIZE 4098 + --------------------------- 4514 4099 4515 - Architectures: arm, arm64 4516 - Type: vcpu ioctl 4517 - Parameters: int feature (in) 4518 - Returns: 0 on success, -1 on error 4100 + :Architectures: arm, arm64 4101 + :Type: vcpu ioctl 4102 + :Parameters: int feature (in) 4103 + :Returns: 0 on success, -1 on error 4104 + 4519 4105 Errors: 4520 - EPERM: feature not enabled, needs configuration, or already finalized 4521 - EINVAL: feature unknown or not present 4106 + 4107 + ====== ============================================================== 4108 + EPERM feature not enabled, needs configuration, or already finalized 4109 + EINVAL feature unknown or not present 4110 + ====== ============================================================== 4522 4111 4523 4112 Recognised values for feature: 4113 + 4114 + ===== =========================================== 4524 4115 arm64 KVM_ARM_VCPU_SVE (requires KVM_CAP_ARM_SVE) 4116 + ===== =========================================== 4525 4117 4526 4118 Finalizes the configuration of the specified vcpu feature. 4527 4119 ··· 4553 4129 using this ioctl. 4554 4130 4555 4131 4.120 KVM_SET_PMU_EVENT_FILTER 4132 + ------------------------------ 4556 4133 4557 - Capability: KVM_CAP_PMU_EVENT_FILTER 4558 - Architectures: x86 4559 - Type: vm ioctl 4560 - Parameters: struct kvm_pmu_event_filter (in) 4561 - Returns: 0 on success, -1 on error 4134 + :Capability: KVM_CAP_PMU_EVENT_FILTER 4135 + :Architectures: x86 4136 + :Type: vm ioctl 4137 + :Parameters: struct kvm_pmu_event_filter (in) 4138 + :Returns: 0 on success, -1 on error 4562 4139 4563 - struct kvm_pmu_event_filter { 4140 + :: 4141 + 4142 + struct kvm_pmu_event_filter { 4564 4143 __u32 action; 4565 4144 __u32 nevents; 4566 4145 __u32 fixed_counter_bitmap; 4567 4146 __u32 flags; 4568 4147 __u32 pad[4]; 4569 4148 __u64 events[0]; 4570 - }; 4149 + }; 4571 4150 4572 4151 This ioctl restricts the set of PMU events that the guest can program. 4573 4152 The argument holds a list of events which will be allowed or denied. ··· 4581 4154 4582 4155 No flags are defined yet, the field must be zero. 4583 4156 4584 - Valid values for 'action': 4585 - #define KVM_PMU_EVENT_ALLOW 0 4586 - #define KVM_PMU_EVENT_DENY 1 4157 + Valid values for 'action':: 4158 + 4159 + #define KVM_PMU_EVENT_ALLOW 0 4160 + #define KVM_PMU_EVENT_DENY 1 4587 4161 4588 4162 4.121 KVM_PPC_SVM_OFF 4163 + --------------------- 4589 4164 4590 - Capability: basic 4591 - Architectures: powerpc 4592 - Type: vm ioctl 4593 - Parameters: none 4594 - Returns: 0 on successful completion, 4165 + :Capability: basic 4166 + :Architectures: powerpc 4167 + :Type: vm ioctl 4168 + :Parameters: none 4169 + :Returns: 0 on successful completion, 4170 + 4595 4171 Errors: 4596 - EINVAL: if ultravisor failed to terminate the secure guest 4597 - ENOMEM: if hypervisor failed to allocate new radix page tables for guest 4172 + 4173 + ====== ================================================================ 4174 + EINVAL if ultravisor failed to terminate the secure guest 4175 + ENOMEM if hypervisor failed to allocate new radix page tables for guest 4176 + ====== ================================================================ 4598 4177 4599 4178 This ioctl is used to turn off the secure mode of the guest or transition 4600 4179 the guest from secure mode to normal mode. This is invoked when the guest ··· 4647 4214 4648 4215 4649 4216 5. The kvm_run structure 4650 - ------------------------ 4217 + ======================== 4651 4218 4652 4219 Application code obtains a pointer to the kvm_run structure by 4653 4220 mmap()ing a vcpu fd. From that point, application code can control ··· 4655 4222 ioctl, and obtain information about the reason KVM_RUN returned by 4656 4223 looking up structure members. 4657 4224 4658 - struct kvm_run { 4225 + :: 4226 + 4227 + struct kvm_run { 4659 4228 /* in */ 4660 4229 __u8 request_interrupt_window; 4661 4230 4662 4231 Request that KVM_RUN return when it becomes possible to inject external 4663 4232 interrupts into the guest. Useful in conjunction with KVM_INTERRUPT. 4233 + 4234 + :: 4664 4235 4665 4236 __u8 immediate_exit; 4666 4237 ··· 4677 4240 4678 4241 This field is ignored if KVM_CAP_IMMEDIATE_EXIT is not available. 4679 4242 4243 + :: 4244 + 4680 4245 __u8 padding1[6]; 4681 4246 4682 4247 /* out */ ··· 4688 4249 application code why KVM_RUN has returned. Allowable values for this 4689 4250 field are detailed below. 4690 4251 4252 + :: 4253 + 4691 4254 __u8 ready_for_interrupt_injection; 4692 4255 4693 4256 If request_interrupt_window has been specified, this field indicates 4694 4257 an interrupt can be injected now with KVM_INTERRUPT. 4695 4258 4259 + :: 4260 + 4696 4261 __u8 if_flag; 4697 4262 4698 4263 The value of the current interrupt flag. Only valid if in-kernel 4699 4264 local APIC is not used. 4265 + 4266 + :: 4700 4267 4701 4268 __u16 flags; 4702 4269 ··· 4711 4266 KVM_RUN_X86_SMM, which is valid on x86 machines and is set if the 4712 4267 VCPU is in system management mode. 4713 4268 4269 + :: 4270 + 4714 4271 /* in (pre_kvm_run), out (post_kvm_run) */ 4715 4272 __u64 cr8; 4716 4273 4717 4274 The value of the cr8 register. Only valid if in-kernel local APIC is 4718 4275 not used. Both input and output. 4719 4276 4277 + :: 4278 + 4720 4279 __u64 apic_base; 4721 4280 4722 4281 The value of the APIC BASE msr. Only valid if in-kernel local 4723 4282 APIC is not used. Both input and output. 4283 + 4284 + :: 4724 4285 4725 4286 union { 4726 4287 /* KVM_EXIT_UNKNOWN */ ··· 4738 4287 reasons. Further architecture-specific information is available in 4739 4288 hardware_exit_reason. 4740 4289 4290 + :: 4291 + 4741 4292 /* KVM_EXIT_FAIL_ENTRY */ 4742 4293 struct { 4743 4294 __u64 hardware_entry_failure_reason; ··· 4749 4296 to unknown reasons. Further architecture-specific information is 4750 4297 available in hardware_entry_failure_reason. 4751 4298 4299 + :: 4300 + 4752 4301 /* KVM_EXIT_EXCEPTION */ 4753 4302 struct { 4754 4303 __u32 exception; ··· 4759 4304 4760 4305 Unused. 4761 4306 4307 + :: 4308 + 4762 4309 /* KVM_EXIT_IO */ 4763 4310 struct { 4764 - #define KVM_EXIT_IO_IN 0 4765 - #define KVM_EXIT_IO_OUT 1 4311 + #define KVM_EXIT_IO_IN 0 4312 + #define KVM_EXIT_IO_OUT 1 4766 4313 __u8 direction; 4767 4314 __u8 size; /* bytes */ 4768 4315 __u16 port; ··· 4778 4321 where kvm expects application code to place the data for the next 4779 4322 KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array. 4780 4323 4324 + :: 4325 + 4781 4326 /* KVM_EXIT_DEBUG */ 4782 4327 struct { 4783 4328 struct kvm_debug_exit_arch arch; ··· 4787 4328 4788 4329 If the exit_reason is KVM_EXIT_DEBUG, then a vcpu is processing a debug event 4789 4330 for which architecture specific information is returned. 4331 + 4332 + :: 4790 4333 4791 4334 /* KVM_EXIT_MMIO */ 4792 4335 struct { ··· 4807 4346 appear if the VCPU performed a load or store of the appropriate width directly 4808 4347 to the byte array. 4809 4348 4810 - NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and 4349 + .. note:: 4350 + 4351 + For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and 4811 4352 KVM_EXIT_EPR the corresponding 4353 + 4812 4354 operations are complete (and guest state is consistent) only after userspace 4813 4355 has re-entered the kernel with KVM_RUN. The kernel side will first finish 4814 4356 incomplete operations and then check for pending signals. Userspace 4815 4357 can re-enter the guest with an unmasked signal pending to complete 4816 4358 pending operations. 4359 + 4360 + :: 4817 4361 4818 4362 /* KVM_EXIT_HYPERCALL */ 4819 4363 struct { ··· 4831 4365 4832 4366 Unused. This was once used for 'hypercall to userspace'. To implement 4833 4367 such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390). 4834 - Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO. 4368 + 4369 + .. note:: KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO. 4370 + 4371 + :: 4835 4372 4836 4373 /* KVM_EXIT_TPR_ACCESS */ 4837 4374 struct { ··· 4844 4375 } tpr_access; 4845 4376 4846 4377 To be documented (KVM_TPR_ACCESS_REPORTING). 4378 + 4379 + :: 4847 4380 4848 4381 /* KVM_EXIT_S390_SIEIC */ 4849 4382 struct { ··· 4858 4387 4859 4388 s390 specific. 4860 4389 4390 + :: 4391 + 4861 4392 /* KVM_EXIT_S390_RESET */ 4862 - #define KVM_S390_RESET_POR 1 4863 - #define KVM_S390_RESET_CLEAR 2 4864 - #define KVM_S390_RESET_SUBSYSTEM 4 4865 - #define KVM_S390_RESET_CPU_INIT 8 4866 - #define KVM_S390_RESET_IPL 16 4393 + #define KVM_S390_RESET_POR 1 4394 + #define KVM_S390_RESET_CLEAR 2 4395 + #define KVM_S390_RESET_SUBSYSTEM 4 4396 + #define KVM_S390_RESET_CPU_INIT 8 4397 + #define KVM_S390_RESET_IPL 16 4867 4398 __u64 s390_reset_flags; 4868 4399 4869 4400 s390 specific. 4401 + 4402 + :: 4870 4403 4871 4404 /* KVM_EXIT_S390_UCONTROL */ 4872 4405 struct { ··· 4886 4411 Principles of Operation Book in the Chapter for Dynamic Address Translation 4887 4412 (DAT) 4888 4413 4414 + :: 4415 + 4889 4416 /* KVM_EXIT_DCR */ 4890 4417 struct { 4891 4418 __u32 dcrn; ··· 4896 4419 } dcr; 4897 4420 4898 4421 Deprecated - was used for 440 KVM. 4422 + 4423 + :: 4899 4424 4900 4425 /* KVM_EXIT_OSI */ 4901 4426 struct { ··· 4911 4432 Userspace can now handle the hypercall and when it's done modify the gprs as 4912 4433 necessary. Upon guest entry all guest GPRs will then be replaced by the values 4913 4434 in this struct. 4435 + 4436 + :: 4914 4437 4915 4438 /* KVM_EXIT_PAPR_HCALL */ 4916 4439 struct { ··· 4931 4450 Requirements (PAPR) document available from www.power.org (free 4932 4451 developer registration required to access it). 4933 4452 4453 + :: 4454 + 4934 4455 /* KVM_EXIT_S390_TSCH */ 4935 4456 struct { 4936 4457 __u16 subchannel_id; ··· 4948 4465 interrupt for the target subchannel has been dequeued and subchannel_id, 4949 4466 subchannel_nr, io_int_parm and io_int_word contain the parameters for that 4950 4467 interrupt. ipb is needed for instruction parameter decoding. 4468 + 4469 + :: 4951 4470 4952 4471 /* KVM_EXIT_EPR */ 4953 4472 struct { ··· 4970 4485 external interrupt has just been delivered into the guest. User space 4971 4486 should put the acknowledged interrupt vector into the 'epr' field. 4972 4487 4488 + :: 4489 + 4973 4490 /* KVM_EXIT_SYSTEM_EVENT */ 4974 4491 struct { 4975 - #define KVM_SYSTEM_EVENT_SHUTDOWN 1 4976 - #define KVM_SYSTEM_EVENT_RESET 2 4977 - #define KVM_SYSTEM_EVENT_CRASH 3 4492 + #define KVM_SYSTEM_EVENT_SHUTDOWN 1 4493 + #define KVM_SYSTEM_EVENT_RESET 2 4494 + #define KVM_SYSTEM_EVENT_CRASH 3 4978 4495 __u32 type; 4979 4496 __u64 flags; 4980 4497 } system_event; ··· 4989 4502 specific flags for the system-level event. 4990 4503 4991 4504 Valid values for 'type' are: 4992 - KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the 4505 + 4506 + - KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the 4993 4507 VM. Userspace is not obliged to honour this, and if it does honour 4994 4508 this does not need to destroy the VM synchronously (ie it may call 4995 4509 KVM_RUN again before shutdown finally occurs). 4996 - KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM. 4510 + - KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM. 4997 4511 As with SHUTDOWN, userspace can choose to ignore the request, or 4998 4512 to schedule the reset to occur in the future and may call KVM_RUN again. 4999 - KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest 4513 + - KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest 5000 4514 has requested a crash condition maintenance. Userspace can choose 5001 4515 to ignore the request, or to gather VM memory core dump and/or 5002 4516 reset/shutdown of the VM. 4517 + 4518 + :: 5003 4519 5004 4520 /* KVM_EXIT_IOAPIC_EOI */ 5005 4521 struct { ··· 5016 4526 it is still asserted. Vector is the LAPIC interrupt vector for which the 5017 4527 EOI was received. 5018 4528 4529 + :: 4530 + 5019 4531 struct kvm_hyperv_exit { 5020 - #define KVM_EXIT_HYPERV_SYNIC 1 5021 - #define KVM_EXIT_HYPERV_HCALL 2 4532 + #define KVM_EXIT_HYPERV_SYNIC 1 4533 + #define KVM_EXIT_HYPERV_HCALL 2 5022 4534 __u32 type; 5023 4535 union { 5024 4536 struct { ··· 5038 4546 }; 5039 4547 /* KVM_EXIT_HYPERV */ 5040 4548 struct kvm_hyperv_exit hyperv; 4549 + 5041 4550 Indicates that the VCPU exits into userspace to process some tasks 5042 4551 related to Hyper-V emulation. 4552 + 5043 4553 Valid values for 'type' are: 5044 - KVM_EXIT_HYPERV_SYNIC -- synchronously notify user-space about 4554 + 4555 + - KVM_EXIT_HYPERV_SYNIC -- synchronously notify user-space about 4556 + 5045 4557 Hyper-V SynIC state change. Notification is used to remap SynIC 5046 4558 event/message pages and to enable/disable SynIC messages/events processing 5047 4559 in userspace. 4560 + 4561 + :: 5048 4562 5049 4563 /* KVM_EXIT_ARM_NISV */ 5050 4564 struct { ··· 5085 4587 KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state 5086 4588 if it decides to decode and emulate the instruction. 5087 4589 4590 + :: 4591 + 5088 4592 /* Fix the size of the union. */ 5089 4593 char padding[256]; 5090 4594 }; ··· 5111 4611 Userspace can query the validity of the structure by checking 5112 4612 kvm_valid_regs for specific bits. These bits are architecture specific 5113 4613 and usually define the validity of a groups of registers. (e.g. one bit 5114 - for general purpose registers) 4614 + for general purpose registers) 5115 4615 5116 4616 Please note that the kernel is allowed to use the kvm_run structure as the 5117 4617 primary storage for certain register types. Therefore, the kernel may use the 5118 4618 values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. 5119 4619 5120 - }; 4620 + :: 4621 + 4622 + }; 5121 4623 5122 4624 5123 4625 5124 4626 6. Capabilities that can be enabled on vCPUs 5125 - -------------------------------------------- 4627 + ============================================ 5126 4628 5127 4629 There are certain capabilities that change the behavior of the virtual CPU or 5128 4630 the virtual machine when enabled. To enable them, please see section 4.37. ··· 5133 4631 5134 4632 The following information is provided along with the description: 5135 4633 5136 - Architectures: which instruction set architectures provide this ioctl. 4634 + Architectures: 4635 + which instruction set architectures provide this ioctl. 5137 4636 x86 includes both i386 and x86_64. 5138 4637 5139 - Target: whether this is a per-vcpu or per-vm capability. 4638 + Target: 4639 + whether this is a per-vcpu or per-vm capability. 5140 4640 5141 - Parameters: what parameters are accepted by the capability. 4641 + Parameters: 4642 + what parameters are accepted by the capability. 5142 4643 5143 - Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL) 4644 + Returns: 4645 + the return value. General error numbers (EBADF, ENOMEM, EINVAL) 5144 4646 are not detailed, but errors with specific meanings are. 5145 4647 5146 4648 5147 4649 6.1 KVM_CAP_PPC_OSI 4650 + ------------------- 5148 4651 5149 - Architectures: ppc 5150 - Target: vcpu 5151 - Parameters: none 5152 - Returns: 0 on success; -1 on error 4652 + :Architectures: ppc 4653 + :Target: vcpu 4654 + :Parameters: none 4655 + :Returns: 0 on success; -1 on error 5153 4656 5154 4657 This capability enables interception of OSI hypercalls that otherwise would 5155 4658 be treated as normal system calls to be injected into the guest. OSI hypercalls ··· 5165 4658 5166 4659 5167 4660 6.2 KVM_CAP_PPC_PAPR 4661 + -------------------- 5168 4662 5169 - Architectures: ppc 5170 - Target: vcpu 5171 - Parameters: none 5172 - Returns: 0 on success; -1 on error 4663 + :Architectures: ppc 4664 + :Target: vcpu 4665 + :Parameters: none 4666 + :Returns: 0 on success; -1 on error 5173 4667 5174 4668 This capability enables interception of PAPR hypercalls. PAPR hypercalls are 5175 4669 done using the hypercall instruction "sc 1". ··· 5186 4678 5187 4679 5188 4680 6.3 KVM_CAP_SW_TLB 4681 + ------------------ 5189 4682 5190 - Architectures: ppc 5191 - Target: vcpu 5192 - Parameters: args[0] is the address of a struct kvm_config_tlb 5193 - Returns: 0 on success; -1 on error 4683 + :Architectures: ppc 4684 + :Target: vcpu 4685 + :Parameters: args[0] is the address of a struct kvm_config_tlb 4686 + :Returns: 0 on success; -1 on error 5194 4687 5195 - struct kvm_config_tlb { 4688 + :: 4689 + 4690 + struct kvm_config_tlb { 5196 4691 __u64 params; 5197 4692 __u64 array; 5198 4693 __u32 mmu_type; 5199 4694 __u32 array_len; 5200 - }; 4695 + }; 5201 4696 5202 4697 Configures the virtual CPU's TLB array, establishing a shared memory area 5203 4698 between userspace and KVM. The "params" and "array" fields are userspace ··· 5219 4708 on this vcpu. 5220 4709 5221 4710 For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV: 4711 + 5222 4712 - The "params" field is of type "struct kvm_book3e_206_tlb_params". 5223 4713 - The "array" field points to an array of type "struct 5224 4714 kvm_book3e_206_tlb_entry". ··· 5233 4721 hardware ignores this value for TLB0. 5234 4722 5235 4723 6.4 KVM_CAP_S390_CSS_SUPPORT 4724 + ---------------------------- 5236 4725 5237 - Architectures: s390 5238 - Target: vcpu 5239 - Parameters: none 5240 - Returns: 0 on success; -1 on error 4726 + :Architectures: s390 4727 + :Target: vcpu 4728 + :Parameters: none 4729 + :Returns: 0 on success; -1 on error 5241 4730 5242 4731 This capability enables support for handling of channel I/O instructions. 5243 4732 ··· 5252 4739 virtual machine is affected. 5253 4740 5254 4741 6.5 KVM_CAP_PPC_EPR 4742 + ------------------- 5255 4743 5256 - Architectures: ppc 5257 - Target: vcpu 5258 - Parameters: args[0] defines whether the proxy facility is active 5259 - Returns: 0 on success; -1 on error 4744 + :Architectures: ppc 4745 + :Target: vcpu 4746 + :Parameters: args[0] defines whether the proxy facility is active 4747 + :Returns: 0 on success; -1 on error 5260 4748 5261 4749 This capability enables or disables the delivery of interrupts through the 5262 4750 external proxy facility. ··· 5271 4757 When this capability is enabled, KVM_EXIT_EPR can occur. 5272 4758 5273 4759 6.6 KVM_CAP_IRQ_MPIC 4760 + -------------------- 5274 4761 5275 - Architectures: ppc 5276 - Parameters: args[0] is the MPIC device fd 5277 - args[1] is the MPIC CPU number for this vcpu 4762 + :Architectures: ppc 4763 + :Parameters: args[0] is the MPIC device fd; 4764 + args[1] is the MPIC CPU number for this vcpu 5278 4765 5279 4766 This capability connects the vcpu to an in-kernel MPIC device. 5280 4767 5281 4768 6.7 KVM_CAP_IRQ_XICS 4769 + -------------------- 5282 4770 5283 - Architectures: ppc 5284 - Target: vcpu 5285 - Parameters: args[0] is the XICS device fd 5286 - args[1] is the XICS CPU number (server ID) for this vcpu 4771 + :Architectures: ppc 4772 + :Target: vcpu 4773 + :Parameters: args[0] is the XICS device fd; 4774 + args[1] is the XICS CPU number (server ID) for this vcpu 5287 4775 5288 4776 This capability connects the vcpu to an in-kernel XICS device. 5289 4777 5290 4778 6.8 KVM_CAP_S390_IRQCHIP 4779 + ------------------------ 5291 4780 5292 - Architectures: s390 5293 - Target: vm 5294 - Parameters: none 4781 + :Architectures: s390 4782 + :Target: vm 4783 + :Parameters: none 5295 4784 5296 4785 This capability enables the in-kernel irqchip for s390. Please refer to 5297 4786 "4.24 KVM_CREATE_IRQCHIP" for details. 5298 4787 5299 4788 6.9 KVM_CAP_MIPS_FPU 4789 + -------------------- 5300 4790 5301 - Architectures: mips 5302 - Target: vcpu 5303 - Parameters: args[0] is reserved for future use (should be 0). 4791 + :Architectures: mips 4792 + :Target: vcpu 4793 + :Parameters: args[0] is reserved for future use (should be 0). 5304 4794 5305 4795 This capability allows the use of the host Floating Point Unit by the guest. It 5306 4796 allows the Config1.FP bit to be set to enable the FPU in the guest. Once this is 5307 - done the KVM_REG_MIPS_FPR_* and KVM_REG_MIPS_FCR_* registers can be accessed 5308 - (depending on the current guest FPU register mode), and the Status.FR, 4797 + done the ``KVM_REG_MIPS_FPR_*`` and ``KVM_REG_MIPS_FCR_*`` registers can be 4798 + accessed (depending on the current guest FPU register mode), and the Status.FR, 5309 4799 Config5.FRE bits are accessible via the KVM API and also from the guest, 5310 4800 depending on them being supported by the FPU. 5311 4801 5312 4802 6.10 KVM_CAP_MIPS_MSA 4803 + --------------------- 5313 4804 5314 - Architectures: mips 5315 - Target: vcpu 5316 - Parameters: args[0] is reserved for future use (should be 0). 4805 + :Architectures: mips 4806 + :Target: vcpu 4807 + :Parameters: args[0] is reserved for future use (should be 0). 5317 4808 5318 4809 This capability allows the use of the MIPS SIMD Architecture (MSA) by the guest. 5319 4810 It allows the Config3.MSAP bit to be set to enable the use of MSA by the guest. 5320 - Once this is done the KVM_REG_MIPS_VEC_* and KVM_REG_MIPS_MSA_* registers can be 5321 - accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from 5322 - the guest. 4811 + Once this is done the ``KVM_REG_MIPS_VEC_*`` and ``KVM_REG_MIPS_MSA_*`` 4812 + registers can be accessed, and the Config5.MSAEn bit is accessible via the 4813 + KVM API and also from the guest. 5323 4814 5324 4815 6.74 KVM_CAP_SYNC_REGS 5325 - Architectures: s390, x86 5326 - Target: s390: always enabled, x86: vcpu 5327 - Parameters: none 5328 - Returns: x86: KVM_CHECK_EXTENSION returns a bit-array indicating which register 5329 - sets are supported (bitfields defined in arch/x86/include/uapi/asm/kvm.h). 4816 + ---------------------- 4817 + 4818 + :Architectures: s390, x86 4819 + :Target: s390: always enabled, x86: vcpu 4820 + :Parameters: none 4821 + :Returns: x86: KVM_CHECK_EXTENSION returns a bit-array indicating which register 4822 + sets are supported 4823 + (bitfields defined in arch/x86/include/uapi/asm/kvm.h). 5330 4824 5331 4825 As described above in the kvm_sync_regs struct info in section 5 (kvm_run): 5332 4826 KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers ··· 5347 4825 For s390 specifics, please refer to the source code. 5348 4826 5349 4827 For x86: 4828 + 5350 4829 - the register sets to be copied out to kvm_run are selectable 5351 4830 by userspace (rather that all sets being copied out for every exit). 5352 4831 - vcpu_events are available in addition to regs and sregs. ··· 5364 4841 5365 4842 Unused bitfields in the bitarrays must be set to zero. 5366 4843 5367 - struct kvm_sync_regs { 4844 + :: 4845 + 4846 + struct kvm_sync_regs { 5368 4847 struct kvm_regs regs; 5369 4848 struct kvm_sregs sregs; 5370 4849 struct kvm_vcpu_events events; 5371 - }; 4850 + }; 5372 4851 5373 4852 6.75 KVM_CAP_PPC_IRQ_XIVE 4853 + ------------------------- 5374 4854 5375 - Architectures: ppc 5376 - Target: vcpu 5377 - Parameters: args[0] is the XIVE device fd 5378 - args[1] is the XIVE CPU number (server ID) for this vcpu 4855 + :Architectures: ppc 4856 + :Target: vcpu 4857 + :Parameters: args[0] is the XIVE device fd; 4858 + args[1] is the XIVE CPU number (server ID) for this vcpu 5379 4859 5380 4860 This capability connects the vcpu to an in-kernel XIVE device. 5381 4861 5382 4862 7. Capabilities that can be enabled on VMs 5383 - ------------------------------------------ 4863 + ========================================== 5384 4864 5385 4865 There are certain capabilities that change the behavior of the virtual 5386 4866 machine when enabled. To enable them, please see section 4.37. Below ··· 5392 4866 5393 4867 The following information is provided along with the description: 5394 4868 5395 - Architectures: which instruction set architectures provide this ioctl. 4869 + Architectures: 4870 + which instruction set architectures provide this ioctl. 5396 4871 x86 includes both i386 and x86_64. 5397 4872 5398 - Parameters: what parameters are accepted by the capability. 4873 + Parameters: 4874 + what parameters are accepted by the capability. 5399 4875 5400 - Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL) 4876 + Returns: 4877 + the return value. General error numbers (EBADF, ENOMEM, EINVAL) 5401 4878 are not detailed, but errors with specific meanings are. 5402 4879 5403 4880 5404 4881 7.1 KVM_CAP_PPC_ENABLE_HCALL 4882 + ---------------------------- 5405 4883 5406 - Architectures: ppc 5407 - Parameters: args[0] is the sPAPR hcall number 5408 - args[1] is 0 to disable, 1 to enable in-kernel handling 4884 + :Architectures: ppc 4885 + :Parameters: args[0] is the sPAPR hcall number; 4886 + args[1] is 0 to disable, 1 to enable in-kernel handling 5409 4887 5410 4888 This capability controls whether individual sPAPR hypercalls (hcalls) 5411 4889 get handled by the kernel or not. Enabling or disabling in-kernel ··· 5427 4897 error. 5428 4898 5429 4899 7.2 KVM_CAP_S390_USER_SIGP 4900 + -------------------------- 5430 4901 5431 - Architectures: s390 5432 - Parameters: none 4902 + :Architectures: s390 4903 + :Parameters: none 5433 4904 5434 4905 This capability controls which SIGP orders will be handled completely in user 5435 4906 space. With this capability enabled, all fast orders will be handled completely 5436 4907 in the kernel: 4908 + 5437 4909 - SENSE 5438 4910 - SENSE RUNNING 5439 4911 - EXTERNAL CALL ··· 5449 4917 old way of handling SIGP orders is used (partially in kernel and user space). 5450 4918 5451 4919 7.3 KVM_CAP_S390_VECTOR_REGISTERS 4920 + --------------------------------- 5452 4921 5453 - Architectures: s390 5454 - Parameters: none 5455 - Returns: 0 on success, negative value on error 4922 + :Architectures: s390 4923 + :Parameters: none 4924 + :Returns: 0 on success, negative value on error 5456 4925 5457 4926 Allows use of the vector registers introduced with z13 processor, and 5458 4927 provides for the synchronization between host and user space. Will 5459 4928 return -EINVAL if the machine does not support vectors. 5460 4929 5461 4930 7.4 KVM_CAP_S390_USER_STSI 4931 + -------------------------- 5462 4932 5463 - Architectures: s390 5464 - Parameters: none 4933 + :Architectures: s390 4934 + :Parameters: none 5465 4935 5466 4936 This capability allows post-handlers for the STSI instruction. After 5467 4937 initial handling in the kernel, KVM exits to user space with 5468 4938 KVM_EXIT_S390_STSI to allow user space to insert further data. 5469 4939 5470 4940 Before exiting to userspace, kvm handlers should fill in s390_stsi field of 5471 - vcpu->run: 5472 - struct { 4941 + vcpu->run:: 4942 + 4943 + struct { 5473 4944 __u64 addr; 5474 4945 __u8 ar; 5475 4946 __u8 reserved; 5476 4947 __u8 fc; 5477 4948 __u8 sel1; 5478 4949 __u16 sel2; 5479 - } s390_stsi; 4950 + } s390_stsi; 5480 4951 5481 - @addr - guest address of STSI SYSIB 5482 - @fc - function code 5483 - @sel1 - selector 1 5484 - @sel2 - selector 2 5485 - @ar - access register number 4952 + @addr - guest address of STSI SYSIB 4953 + @fc - function code 4954 + @sel1 - selector 1 4955 + @sel2 - selector 2 4956 + @ar - access register number 5486 4957 5487 4958 KVM handlers should exit to userspace with rc = -EREMOTE. 5488 4959 5489 4960 7.5 KVM_CAP_SPLIT_IRQCHIP 4961 + ------------------------- 5490 4962 5491 - Architectures: x86 5492 - Parameters: args[0] - number of routes reserved for userspace IOAPICs 5493 - Returns: 0 on success, -1 on error 4963 + :Architectures: x86 4964 + :Parameters: args[0] - number of routes reserved for userspace IOAPICs 4965 + :Returns: 0 on success, -1 on error 5494 4966 5495 4967 Create a local apic for each processor in the kernel. This can be used 5496 4968 instead of KVM_CREATE_IRQCHIP if the userspace VMM wishes to emulate the ··· 5511 4975 kernel (i.e. KVM_CREATE_IRQCHIP has already been called). 5512 4976 5513 4977 7.6 KVM_CAP_S390_RI 4978 + ------------------- 5514 4979 5515 - Architectures: s390 5516 - Parameters: none 4980 + :Architectures: s390 4981 + :Parameters: none 5517 4982 5518 4983 Allows use of runtime-instrumentation introduced with zEC12 processor. 5519 4984 Will return -EINVAL if the machine does not support runtime-instrumentation. 5520 4985 Will return -EBUSY if a VCPU has already been created. 5521 4986 5522 4987 7.7 KVM_CAP_X2APIC_API 4988 + ---------------------- 5523 4989 5524 - Architectures: x86 5525 - Parameters: args[0] - features that should be enabled 5526 - Returns: 0 on success, -EINVAL when args[0] contains invalid features 4990 + :Architectures: x86 4991 + :Parameters: args[0] - features that should be enabled 4992 + :Returns: 0 on success, -EINVAL when args[0] contains invalid features 5527 4993 5528 - Valid feature flags in args[0] are 4994 + Valid feature flags in args[0] are:: 5529 4995 5530 - #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0) 5531 - #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1) 4996 + #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0) 4997 + #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1) 5532 4998 5533 4999 Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of 5534 5000 KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC, ··· 5544 5006 where 0xff represents CPUs 0-7 in cluster 0. 5545 5007 5546 5008 7.8 KVM_CAP_S390_USER_INSTR0 5009 + ---------------------------- 5547 5010 5548 - Architectures: s390 5549 - Parameters: none 5011 + :Architectures: s390 5012 + :Parameters: none 5550 5013 5551 5014 With this capability enabled, all illegal instructions 0x0000 (2 bytes) will 5552 5015 be intercepted and forwarded to user space. User space can use this ··· 5559 5020 created and are running. 5560 5021 5561 5022 7.9 KVM_CAP_S390_GS 5023 + ------------------- 5562 5024 5563 - Architectures: s390 5564 - Parameters: none 5565 - Returns: 0 on success; -EINVAL if the machine does not support 5566 - guarded storage; -EBUSY if a VCPU has already been created. 5025 + :Architectures: s390 5026 + :Parameters: none 5027 + :Returns: 0 on success; -EINVAL if the machine does not support 5028 + guarded storage; -EBUSY if a VCPU has already been created. 5567 5029 5568 5030 Allows use of guarded storage for the KVM guest. 5569 5031 5570 5032 7.10 KVM_CAP_S390_AIS 5033 + --------------------- 5571 5034 5572 - Architectures: s390 5573 - Parameters: none 5035 + :Architectures: s390 5036 + :Parameters: none 5574 5037 5575 5038 Allow use of adapter-interruption suppression. 5576 - Returns: 0 on success; -EBUSY if a VCPU has already been created. 5039 + :Returns: 0 on success; -EBUSY if a VCPU has already been created. 5577 5040 5578 5041 7.11 KVM_CAP_PPC_SMT 5042 + -------------------- 5579 5043 5580 - Architectures: ppc 5581 - Parameters: vsmt_mode, flags 5044 + :Architectures: ppc 5045 + :Parameters: vsmt_mode, flags 5582 5046 5583 5047 Enabling this capability on a VM provides userspace with a way to set 5584 5048 the desired virtual SMT mode (i.e. the number of virtual CPUs per ··· 5596 5054 modes are available. 5597 5055 5598 5056 7.12 KVM_CAP_PPC_FWNMI 5057 + ---------------------- 5599 5058 5600 - Architectures: ppc 5601 - Parameters: none 5059 + :Architectures: ppc 5060 + :Parameters: none 5602 5061 5603 5062 With this capability a machine check exception in the guest address 5604 5063 space will cause KVM to exit the guest with NMI exit reason. This ··· 5608 5065 branch to guests' 0x200 interrupt vector. 5609 5066 5610 5067 7.13 KVM_CAP_X86_DISABLE_EXITS 5068 + ------------------------------ 5611 5069 5612 - Architectures: x86 5613 - Parameters: args[0] defines which exits are disabled 5614 - Returns: 0 on success, -EINVAL when args[0] contains invalid exits 5070 + :Architectures: x86 5071 + :Parameters: args[0] defines which exits are disabled 5072 + :Returns: 0 on success, -EINVAL when args[0] contains invalid exits 5615 5073 5616 - Valid bits in args[0] are 5074 + Valid bits in args[0] are:: 5617 5075 5618 - #define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0) 5619 - #define KVM_X86_DISABLE_EXITS_HLT (1 << 1) 5620 - #define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2) 5621 - #define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3) 5076 + #define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0) 5077 + #define KVM_X86_DISABLE_EXITS_HLT (1 << 1) 5078 + #define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2) 5079 + #define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3) 5622 5080 5623 5081 Enabling this capability on a VM provides userspace with a way to no 5624 5082 longer intercept some instructions for improved latency in some ··· 5631 5087 Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits. 5632 5088 5633 5089 7.14 KVM_CAP_S390_HPAGE_1M 5090 + -------------------------- 5634 5091 5635 - Architectures: s390 5636 - Parameters: none 5637 - Returns: 0 on success, -EINVAL if hpage module parameter was not set 5638 - or cmma is enabled, or the VM has the KVM_VM_S390_UCONTROL 5639 - flag set 5092 + :Architectures: s390 5093 + :Parameters: none 5094 + :Returns: 0 on success, -EINVAL if hpage module parameter was not set 5095 + or cmma is enabled, or the VM has the KVM_VM_S390_UCONTROL 5096 + flag set 5640 5097 5641 5098 With this capability the KVM support for memory backing with 1m pages 5642 5099 through hugetlbfs can be enabled for a VM. After the capability is ··· 5649 5104 this capability, the VM will not be able to run. 5650 5105 5651 5106 7.15 KVM_CAP_MSR_PLATFORM_INFO 5107 + ------------------------------ 5652 5108 5653 - Architectures: x86 5654 - Parameters: args[0] whether feature should be enabled or not 5109 + :Architectures: x86 5110 + :Parameters: args[0] whether feature should be enabled or not 5655 5111 5656 5112 With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise, 5657 5113 a #GP would be raised when the guest tries to access. Currently, this 5658 5114 capability does not enable write permissions of this MSR for the guest. 5659 5115 5660 5116 7.16 KVM_CAP_PPC_NESTED_HV 5117 + -------------------------- 5661 5118 5662 - Architectures: ppc 5663 - Parameters: none 5664 - Returns: 0 on success, -EINVAL when the implementation doesn't support 5665 - nested-HV virtualization. 5119 + :Architectures: ppc 5120 + :Parameters: none 5121 + :Returns: 0 on success, -EINVAL when the implementation doesn't support 5122 + nested-HV virtualization. 5666 5123 5667 5124 HV-KVM on POWER9 and later systems allows for "nested-HV" 5668 5125 virtualization, which provides a way for a guest VM to run guests that ··· 5674 5127 kvm-hv module parameter. 5675 5128 5676 5129 7.17 KVM_CAP_EXCEPTION_PAYLOAD 5130 + ------------------------------ 5677 5131 5678 - Architectures: x86 5679 - Parameters: args[0] whether feature should be enabled or not 5132 + :Architectures: x86 5133 + :Parameters: args[0] whether feature should be enabled or not 5680 5134 5681 5135 With this capability enabled, CR2 will not be modified prior to the 5682 5136 emulated VM-exit when L1 intercepts a #PF exception that occurs in ··· 5688 5140 faulting address (or the new DR6 bits*) will be reported in the 5689 5141 exception_payload field. Similarly, when userspace injects a #PF (or 5690 5142 #DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set 5691 - exception.has_payload and to put the faulting address (or the new DR6 5692 - bits*) in the exception_payload field. 5143 + exception.has_payload and to put the faulting address - or the new DR6 5144 + bits\ [#]_ - in the exception_payload field. 5693 5145 5694 5146 This capability also enables exception.pending in struct 5695 5147 kvm_vcpu_events, which allows userspace to distinguish between pending 5696 5148 and injected exceptions. 5697 5149 5698 5150 5699 - * For the new DR6 bits, note that bit 16 is set iff the #DB exception 5700 - will clear DR6.RTM. 5151 + .. [#] For the new DR6 bits, note that bit 16 is set iff the #DB exception 5152 + will clear DR6.RTM. 5701 5153 5702 5154 7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 5703 5155 5704 - Architectures: x86, arm, arm64, mips 5705 - Parameters: args[0] whether feature should be enabled or not 5156 + :Architectures: x86, arm, arm64, mips 5157 + :Parameters: args[0] whether feature should be enabled or not 5706 5158 5707 5159 With this capability enabled, KVM_GET_DIRTY_LOG will not automatically 5708 5160 clear and write-protect all pages that are returned as dirty. ··· 5729 5181 Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT. 5730 5182 5731 5183 8. Other capabilities. 5732 - ---------------------- 5184 + ====================== 5733 5185 5734 5186 This section lists capabilities that give information about other 5735 5187 features of the KVM implementation. 5736 5188 5737 5189 8.1 KVM_CAP_PPC_HWRNG 5190 + --------------------- 5738 5191 5739 - Architectures: ppc 5192 + :Architectures: ppc 5740 5193 5741 5194 This capability, if KVM_CHECK_EXTENSION indicates that it is 5742 5195 available, means that that the kernel has an implementation of the ··· 5746 5197 with the KVM_CAP_PPC_ENABLE_HCALL capability. 5747 5198 5748 5199 8.2 KVM_CAP_HYPERV_SYNIC 5200 + ------------------------ 5749 5201 5750 - Architectures: x86 5202 + :Architectures: x86 5203 + 5751 5204 This capability, if KVM_CHECK_EXTENSION indicates that it is 5752 5205 available, means that that the kernel has an implementation of the 5753 5206 Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is ··· 5761 5210 by the CPU, as it's incompatible with SynIC auto-EOI behavior. 5762 5211 5763 5212 8.3 KVM_CAP_PPC_RADIX_MMU 5213 + ------------------------- 5764 5214 5765 - Architectures: ppc 5215 + :Architectures: ppc 5766 5216 5767 5217 This capability, if KVM_CHECK_EXTENSION indicates that it is 5768 5218 available, means that that the kernel can support guests using the ··· 5771 5219 processor). 5772 5220 5773 5221 8.4 KVM_CAP_PPC_HASH_MMU_V3 5222 + --------------------------- 5774 5223 5775 - Architectures: ppc 5224 + :Architectures: ppc 5776 5225 5777 5226 This capability, if KVM_CHECK_EXTENSION indicates that it is 5778 5227 available, means that that the kernel can support guests using the ··· 5781 5228 the POWER9 processor), including in-memory segment tables. 5782 5229 5783 5230 8.5 KVM_CAP_MIPS_VZ 5231 + ------------------- 5784 5232 5785 - Architectures: mips 5233 + :Architectures: mips 5786 5234 5787 5235 This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that 5788 5236 it is available, means that full hardware assisted virtualization capabilities ··· 5801 5247 possibility of other hardware assisted virtualization implementations which 5802 5248 may be incompatible with the MIPS VZ ASE. 5803 5249 5804 - 0: The trap & emulate implementation is in use to run guest code in user 5250 + == ========================================================================== 5251 + 0 The trap & emulate implementation is in use to run guest code in user 5805 5252 mode. Guest virtual memory segments are rearranged to fit the guest in the 5806 5253 user mode address space. 5807 5254 5808 - 1: The MIPS VZ ASE is in use, providing full hardware assisted 5255 + 1 The MIPS VZ ASE is in use, providing full hardware assisted 5809 5256 virtualization, including standard guest virtual memory segments. 5257 + == ========================================================================== 5810 5258 5811 5259 8.6 KVM_CAP_MIPS_TE 5260 + ------------------- 5812 5261 5813 - Architectures: mips 5262 + :Architectures: mips 5814 5263 5815 5264 This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that 5816 5265 it is available, means that the trap & emulate implementation is available to ··· 5825 5268 available, it means that the VM is using trap & emulate. 5826 5269 5827 5270 8.7 KVM_CAP_MIPS_64BIT 5271 + ---------------------- 5828 5272 5829 - Architectures: mips 5273 + :Architectures: mips 5830 5274 5831 5275 This capability indicates the supported architecture type of the guest, i.e. the 5832 5276 supported register and address width. ··· 5837 5279 be checked specifically against known values (see below). All other values are 5838 5280 reserved. 5839 5281 5840 - 0: MIPS32 or microMIPS32. 5282 + == ======================================================================== 5283 + 0 MIPS32 or microMIPS32. 5841 5284 Both registers and addresses are 32-bits wide. 5842 5285 It will only be possible to run 32-bit guest code. 5843 5286 5844 - 1: MIPS64 or microMIPS64 with access only to 32-bit compatibility segments. 5287 + 1 MIPS64 or microMIPS64 with access only to 32-bit compatibility segments. 5845 5288 Registers are 64-bits wide, but addresses are 32-bits wide. 5846 5289 64-bit guest code may run but cannot access MIPS64 memory segments. 5847 5290 It will also be possible to run 32-bit guest code. 5848 5291 5849 - 2: MIPS64 or microMIPS64 with access to all address segments. 5292 + 2 MIPS64 or microMIPS64 with access to all address segments. 5850 5293 Both registers and addresses are 64-bits wide. 5851 5294 It will be possible to run 64-bit or 32-bit guest code. 5295 + == ======================================================================== 5852 5296 5853 5297 8.9 KVM_CAP_ARM_USER_IRQ 5298 + ------------------------ 5854 5299 5855 - Architectures: arm, arm64 5300 + :Architectures: arm, arm64 5301 + 5856 5302 This capability, if KVM_CHECK_EXTENSION indicates that it is available, means 5857 5303 that if userspace creates a VM without an in-kernel interrupt controller, it 5858 5304 will be notified of changes to the output level of in-kernel emulated devices, ··· 5883 5321 number larger than 0 indicating the version of this capability is implemented 5884 5322 and thereby which bits in in run->s.regs.device_irq_level can signal values. 5885 5323 5886 - Currently the following bits are defined for the device_irq_level bitmap: 5324 + Currently the following bits are defined for the device_irq_level bitmap:: 5887 5325 5888 5326 KVM_CAP_ARM_USER_IRQ >= 1: 5889 5327 ··· 5896 5334 listed above. 5897 5335 5898 5336 8.10 KVM_CAP_PPC_SMT_POSSIBLE 5337 + ----------------------------- 5899 5338 5900 - Architectures: ppc 5339 + :Architectures: ppc 5901 5340 5902 5341 Querying this capability returns a bitmap indicating the possible 5903 5342 virtual SMT modes that can be set using KVM_CAP_PPC_SMT. If bit N ··· 5906 5343 available. 5907 5344 5908 5345 8.11 KVM_CAP_HYPERV_SYNIC2 5346 + -------------------------- 5909 5347 5910 - Architectures: x86 5348 + :Architectures: x86 5911 5349 5912 5350 This capability enables a newer version of Hyper-V Synthetic interrupt 5913 5351 controller (SynIC). The only difference with KVM_CAP_HYPERV_SYNIC is that KVM ··· 5916 5352 writing to the respective MSRs. 5917 5353 5918 5354 8.12 KVM_CAP_HYPERV_VP_INDEX 5355 + ---------------------------- 5919 5356 5920 - Architectures: x86 5357 + :Architectures: x86 5921 5358 5922 5359 This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr. Its 5923 5360 value is used to denote the target vcpu for a SynIC interrupt. For ··· 5926 5361 capability is absent, userspace can still query this msr's value. 5927 5362 5928 5363 8.13 KVM_CAP_S390_AIS_MIGRATION 5364 + ------------------------------- 5929 5365 5930 - Architectures: s390 5931 - Parameters: none 5366 + :Architectures: s390 5367 + :Parameters: none 5932 5368 5933 5369 This capability indicates if the flic device will be able to get/set the 5934 5370 AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows 5935 5371 to discover this without having to create a flic device. 5936 5372 5937 5373 8.14 KVM_CAP_S390_PSW 5374 + --------------------- 5938 5375 5939 - Architectures: s390 5376 + :Architectures: s390 5940 5377 5941 5378 This capability indicates that the PSW is exposed via the kvm_run structure. 5942 5379 5943 5380 8.15 KVM_CAP_S390_GMAP 5381 + ---------------------- 5944 5382 5945 - Architectures: s390 5383 + :Architectures: s390 5946 5384 5947 5385 This capability indicates that the user space memory used as guest mapping can 5948 5386 be anywhere in the user memory address space, as long as the memory slots are 5949 5387 aligned and sized to a segment (1MB) boundary. 5950 5388 5951 5389 8.16 KVM_CAP_S390_COW 5390 + --------------------- 5952 5391 5953 - Architectures: s390 5392 + :Architectures: s390 5954 5393 5955 5394 This capability indicates that the user space memory used as guest mapping can 5956 5395 use copy-on-write semantics as well as dirty pages tracking via read-only page 5957 5396 tables. 5958 5397 5959 5398 8.17 KVM_CAP_S390_BPB 5399 + --------------------- 5960 5400 5961 - Architectures: s390 5401 + :Architectures: s390 5962 5402 5963 5403 This capability indicates that kvm will implement the interfaces to handle 5964 5404 reset, migration and nested KVM for branch prediction blocking. The stfle 5965 5405 facility 82 should not be provided to the guest without this capability. 5966 5406 5967 5407 8.18 KVM_CAP_HYPERV_TLBFLUSH 5408 + ---------------------------- 5968 5409 5969 - Architectures: x86 5410 + :Architectures: x86 5970 5411 5971 5412 This capability indicates that KVM supports paravirtualized Hyper-V TLB Flush 5972 5413 hypercalls: ··· 5980 5409 HvFlushVirtualAddressList, HvFlushVirtualAddressListEx. 5981 5410 5982 5411 8.19 KVM_CAP_ARM_INJECT_SERROR_ESR 5412 + ---------------------------------- 5983 5413 5984 - Architectures: arm, arm64 5414 + :Architectures: arm, arm64 5985 5415 5986 5416 This capability indicates that userspace can specify (via the 5987 5417 KVM_SET_VCPU_EVENTS ioctl) the syndrome value reported to the guest when it ··· 5993 5421 AArch64, this value will be reported in the ISS field of ESR_ELx. 5994 5422 5995 5423 See KVM_CAP_VCPU_EVENTS for more details. 5996 - 8.20 KVM_CAP_HYPERV_SEND_IPI 5997 5424 5998 - Architectures: x86 5425 + 8.20 KVM_CAP_HYPERV_SEND_IPI 5426 + ---------------------------- 5427 + 5428 + :Architectures: x86 5999 5429 6000 5430 This capability indicates that KVM supports paravirtualized Hyper-V IPI send 6001 5431 hypercalls: 6002 5432 HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx. 6003 - 8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH 6004 5433 6005 - Architecture: x86 5434 + 8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH 5435 + ----------------------------------- 5436 + 5437 + :Architecture: x86 6006 5438 6007 5439 This capability indicates that KVM running on top of Hyper-V hypervisor 6008 5440 enables Direct TLB flush for its guests meaning that TLB flush

+19 -9

Documentation/virt/kvm/arm/hyp-abi.txt Documentation/virt/kvm/arm/hyp-abi.rst

··· 1 - * Internal ABI between the kernel and HYP 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ======================================= 4 + Internal ABI between the kernel and HYP 5 + ======================================= 2 6 3 7 This file documents the interaction between the Linux kernel and the 4 8 hypervisor layer when running Linux as a hypervisor (for example ··· 23 19 Unless specified otherwise, any built-in hypervisor must implement 24 20 these functions (see arch/arm{,64}/include/asm/virt.h): 25 21 26 - * r0/x0 = HVC_SET_VECTORS 27 - r1/x1 = vectors 22 + * :: 23 + 24 + r0/x0 = HVC_SET_VECTORS 25 + r1/x1 = vectors 28 26 29 27 Set HVBAR/VBAR_EL2 to 'vectors' to enable a hypervisor. 'vectors' 30 28 must be a physical address, and respect the alignment requirements 31 29 of the architecture. Only implemented by the initial stubs, not by 32 30 Linux hypervisors. 33 31 34 - * r0/x0 = HVC_RESET_VECTORS 32 + * :: 33 + 34 + r0/x0 = HVC_RESET_VECTORS 35 35 36 36 Turn HYP/EL2 MMU off, and reset HVBAR/VBAR_EL2 to the initials 37 37 stubs' exception vector value. This effectively disables an existing 38 38 hypervisor. 39 39 40 - * r0/x0 = HVC_SOFT_RESTART 41 - r1/x1 = restart address 42 - x2 = x0's value when entering the next payload (arm64) 43 - x3 = x1's value when entering the next payload (arm64) 44 - x4 = x2's value when entering the next payload (arm64) 40 + * :: 41 + 42 + r0/x0 = HVC_SOFT_RESTART 43 + r1/x1 = restart address 44 + x2 = x0's value when entering the next payload (arm64) 45 + x3 = x1's value when entering the next payload (arm64) 46 + x4 = x2's value when entering the next payload (arm64) 45 47 46 48 Mask all exceptions, disable the MMU, move the arguments into place 47 49 (arm64 only), and jump to the restart address while at HYP/EL2. This

+12

Documentation/virt/kvm/arm/index.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + === 4 + ARM 5 + === 6 + 7 + .. toctree:: 8 + :maxdepth: 2 9 + 10 + hyp-abi 11 + psci 12 + pvtime

+31 -15

Documentation/virt/kvm/arm/psci.txt Documentation/virt/kvm/arm/psci.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ========================================= 4 + Power State Coordination Interface (PSCI) 5 + ========================================= 6 + 1 7 KVM implements the PSCI (Power State Coordination Interface) 2 8 specification in order to provide services such as CPU on/off, reset 3 9 and power-off to the guest. ··· 36 30 - Affects the whole VM (even if the register view is per-vcpu) 37 31 38 32 * KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1: 39 - Holds the state of the firmware support to mitigate CVE-2017-5715, as 40 - offered by KVM to the guest via a HVC call. The workaround is described 41 - under SMCCC_ARCH_WORKAROUND_1 in [1]. 33 + Holds the state of the firmware support to mitigate CVE-2017-5715, as 34 + offered by KVM to the guest via a HVC call. The workaround is described 35 + under SMCCC_ARCH_WORKAROUND_1 in [1]. 36 + 42 37 Accepted values are: 43 - KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL: KVM does not offer 38 + 39 + KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL: 40 + KVM does not offer 44 41 firmware support for the workaround. The mitigation status for the 45 42 guest is unknown. 46 - KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL: The workaround HVC call is 43 + KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL: 44 + The workaround HVC call is 47 45 available to the guest and required for the mitigation. 48 - KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED: The workaround HVC call 46 + KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED: 47 + The workaround HVC call 49 48 is available to the guest, but it is not needed on this VCPU. 50 49 51 50 * KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2: 52 - Holds the state of the firmware support to mitigate CVE-2018-3639, as 53 - offered by KVM to the guest via a HVC call. The workaround is described 54 - under SMCCC_ARCH_WORKAROUND_2 in [1]. 51 + Holds the state of the firmware support to mitigate CVE-2018-3639, as 52 + offered by KVM to the guest via a HVC call. The workaround is described 53 + under SMCCC_ARCH_WORKAROUND_2 in [1]_. 54 + 55 55 Accepted values are: 56 - KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL: A workaround is not 56 + 57 + KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL: 58 + A workaround is not 57 59 available. KVM does not offer firmware support for the workaround. 58 - KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN: The workaround state is 60 + KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN: 61 + The workaround state is 59 62 unknown. KVM does not offer firmware support for the workaround. 60 - KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL: The workaround is available, 63 + KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL: 64 + The workaround is available, 61 65 and can be disabled by a vCPU. If 62 66 KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED is set, it is active for 63 67 this vCPU. 64 - KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED: The workaround is 65 - always active on this vCPU or it is not needed. 68 + KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED: 69 + The workaround is always active on this vCPU or it is not needed. 66 70 67 - [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf 71 + .. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf

+70 -42

Documentation/virt/kvm/devices/arm-vgic-its.txt Documentation/virt/kvm/devices/arm-vgic-its.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =============================================== 1 4 ARM Virtual Interrupt Translation Service (ITS) 2 5 =============================================== 3 6 ··· 15 12 a separate, non-overlapping MMIO region. 16 13 17 14 18 - Groups: 19 - KVM_DEV_ARM_VGIC_GRP_ADDR 15 + Groups 16 + ====== 17 + 18 + KVM_DEV_ARM_VGIC_GRP_ADDR 19 + ------------------------- 20 + 20 21 Attributes: 21 22 KVM_VGIC_ITS_ADDR_TYPE (rw, 64-bit) 22 23 Base address in the guest physical address space of the GICv3 ITS 23 24 control register frame. 24 25 This address needs to be 64K aligned and the region covers 128K. 26 + 25 27 Errors: 26 - -E2BIG: Address outside of addressable IPA range 27 - -EINVAL: Incorrectly aligned address 28 - -EEXIST: Address already configured 29 - -EFAULT: Invalid user pointer for attr->addr. 30 - -ENODEV: Incorrect attribute or the ITS is not supported. 28 + 29 + ======= ================================================= 30 + -E2BIG Address outside of addressable IPA range 31 + -EINVAL Incorrectly aligned address 32 + -EEXIST Address already configured 33 + -EFAULT Invalid user pointer for attr->addr. 34 + -ENODEV Incorrect attribute or the ITS is not supported. 35 + ======= ================================================= 31 36 32 37 33 - KVM_DEV_ARM_VGIC_GRP_CTRL 38 + KVM_DEV_ARM_VGIC_GRP_CTRL 39 + ------------------------- 40 + 34 41 Attributes: 35 42 KVM_DEV_ARM_VGIC_CTRL_INIT 36 43 request the initialization of the ITS, no additional parameter in ··· 71 58 "ITS Restore Sequence". 72 59 73 60 Errors: 74 - -ENXIO: ITS not properly configured as required prior to setting 75 - this attribute 76 - -ENOMEM: Memory shortage when allocating ITS internal data 77 - -EINVAL: Inconsistent restored data 78 - -EFAULT: Invalid guest ram access 79 - -EBUSY: One or more VCPUS are running 80 - -EACCES: The virtual ITS is backed by a physical GICv4 ITS, and the 81 - state is not available 82 61 83 - KVM_DEV_ARM_VGIC_GRP_ITS_REGS 62 + ======= ========================================================== 63 + -ENXIO ITS not properly configured as required prior to setting 64 + this attribute 65 + -ENOMEM Memory shortage when allocating ITS internal data 66 + -EINVAL Inconsistent restored data 67 + -EFAULT Invalid guest ram access 68 + -EBUSY One or more VCPUS are running 69 + -EACCES The virtual ITS is backed by a physical GICv4 ITS, and the 70 + state is not available 71 + ======= ========================================================== 72 + 73 + KVM_DEV_ARM_VGIC_GRP_ITS_REGS 74 + ----------------------------- 75 + 84 76 Attributes: 85 77 The attr field of kvm_device_attr encodes the offset of the 86 78 ITS register, relative to the ITS control frame base address ··· 96 78 be accessed with full length. 97 79 98 80 Writes to read-only registers are ignored by the kernel except for: 81 + 99 82 - GITS_CREADR. It must be restored otherwise commands in the queue 100 83 will be re-executed after restoring CWRITER. GITS_CREADR must be 101 84 restored before restoring the GITS_CTLR which is likely to enable the ··· 110 91 111 92 For other registers, getting or setting a register has the same 112 93 effect as reading/writing the register on real hardware. 113 - Errors: 114 - -ENXIO: Offset does not correspond to any supported register 115 - -EFAULT: Invalid user pointer for attr->addr 116 - -EINVAL: Offset is not 64-bit aligned 117 - -EBUSY: one or more VCPUS are running 118 94 119 - ITS Restore Sequence: 120 - ------------------------- 95 + Errors: 96 + 97 + ======= ==================================================== 98 + -ENXIO Offset does not correspond to any supported register 99 + -EFAULT Invalid user pointer for attr->addr 100 + -EINVAL Offset is not 64-bit aligned 101 + -EBUSY one or more VCPUS are running 102 + ======= ==================================================== 103 + 104 + ITS Restore Sequence: 105 + --------------------- 121 106 122 107 The following ordering must be followed when restoring the GIC and the ITS: 108 + 123 109 a) restore all guest memory and create vcpus 124 110 b) restore all redistributors 125 111 c) provide the ITS base address 126 112 (KVM_DEV_ARM_VGIC_GRP_ADDR) 127 113 d) restore the ITS in the following order: 128 - 1. Restore GITS_CBASER 129 - 2. Restore all other GITS_ registers, except GITS_CTLR! 130 - 3. Load the ITS table data (KVM_DEV_ARM_ITS_RESTORE_TABLES) 131 - 4. Restore GITS_CTLR 114 + 115 + 1. Restore GITS_CBASER 116 + 2. Restore all other ``GITS_`` registers, except GITS_CTLR! 117 + 3. Load the ITS table data (KVM_DEV_ARM_ITS_RESTORE_TABLES) 118 + 4. Restore GITS_CTLR 132 119 133 120 Then vcpus can be started. 134 121 135 - ITS Table ABI REV0: 136 - ------------------- 122 + ITS Table ABI REV0: 123 + ------------------- 137 124 138 125 Revision 0 of the ABI only supports the features of a virtual GICv3, and does 139 126 not support a virtual GICv4 with support for direct injection of virtual ··· 150 125 entries in the collection are listed in no particular order. 151 126 All entries are 8 bytes. 152 127 153 - Device Table Entry (DTE): 128 + Device Table Entry (DTE):: 154 129 155 - bits: | 63| 62 ... 49 | 48 ... 5 | 4 ... 0 | 156 - values: | V | next | ITT_addr | Size | 130 + bits: | 63| 62 ... 49 | 48 ... 5 | 4 ... 0 | 131 + values: | V | next | ITT_addr | Size | 157 132 158 - where; 133 + where: 134 + 159 135 - V indicates whether the entry is valid. If not, other fields 160 136 are not meaningful. 161 137 - next: equals to 0 if this entry is the last one; otherwise it ··· 166 140 - Size specifies the supported number of bits for the EventID, 167 141 minus one 168 142 169 - Collection Table Entry (CTE): 143 + Collection Table Entry (CTE):: 170 144 171 - bits: | 63| 62 .. 52 | 51 ... 16 | 15 ... 0 | 172 - values: | V | RES0 | RDBase | ICID | 145 + bits: | 63| 62 .. 52 | 51 ... 16 | 15 ... 0 | 146 + values: | V | RES0 | RDBase | ICID | 173 147 174 148 where: 149 + 175 150 - V indicates whether the entry is valid. If not, other fields are 176 151 not meaningful. 177 152 - RES0: reserved field with Should-Be-Zero-or-Preserved behavior. 178 153 - RDBase is the PE number (GICR_TYPER.Processor_Number semantic), 179 154 - ICID is the collection ID 180 155 181 - Interrupt Translation Entry (ITE): 156 + Interrupt Translation Entry (ITE):: 182 157 183 - bits: | 63 ... 48 | 47 ... 16 | 15 ... 0 | 184 - values: | next | pINTID | ICID | 158 + bits: | 63 ... 48 | 47 ... 16 | 15 ... 0 | 159 + values: | next | pINTID | ICID | 185 160 186 161 where: 162 + 187 163 - next: equals to 0 if this entry is the last one; otherwise it corresponds 188 164 to the EventID offset to the next ITE capped by 2^16 -1. 189 165 - pINTID is the physical LPI ID; if zero, it means the entry is not valid 190 166 and other fields are not meaningful. 191 167 - ICID is the collection ID 192 168 193 - ITS Reset State: 194 - ---------------- 169 + ITS Reset State: 170 + ---------------- 195 171 196 172 RESET returns the ITS to the same state that it was when first created and 197 173 initialized. When the RESET command returns, the following things are

+86 -46

Documentation/virt/kvm/devices/arm-vgic-v3.txt Documentation/virt/kvm/devices/arm-vgic-v3.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============================================================== 1 4 ARM Virtual Generic Interrupt Controller v3 and later (VGICv3) 2 5 ============================================================== 3 6 4 7 5 8 Device types supported: 6 - KVM_DEV_TYPE_ARM_VGIC_V3 ARM Generic Interrupt Controller v3.0 9 + - KVM_DEV_TYPE_ARM_VGIC_V3 ARM Generic Interrupt Controller v3.0 7 10 8 11 Only one VGIC instance may be instantiated through this API. The created VGIC 9 12 will act as the VM interrupt controller, requiring emulated user-space devices ··· 18 15 19 16 Groups: 20 17 KVM_DEV_ARM_VGIC_GRP_ADDR 21 - Attributes: 18 + Attributes: 19 + 22 20 KVM_VGIC_V3_ADDR_TYPE_DIST (rw, 64-bit) 23 21 Base address in the guest physical address space of the GICv3 distributor 24 22 register mappings. Only valid for KVM_DEV_TYPE_ARM_VGIC_V3. ··· 33 29 This address needs to be 64K aligned. 34 30 35 31 KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION (rw, 64-bit) 36 - The attribute data pointed to by kvm_device_attr.addr is a __u64 value: 37 - bits: | 63 .... 52 | 51 .... 16 | 15 - 12 |11 - 0 38 - values: | count | base | flags | index 32 + The attribute data pointed to by kvm_device_attr.addr is a __u64 value:: 33 + 34 + bits: | 63 .... 52 | 51 .... 16 | 15 - 12 |11 - 0 35 + values: | count | base | flags | index 36 + 39 37 - index encodes the unique redistributor region index 40 38 - flags: reserved for future use, currently 0 41 39 - base field encodes bits [51:16] of the guest physical base address 42 40 of the first redistributor in the region. 43 41 - count encodes the number of redistributors in the region. Must be 44 42 greater than 0. 43 + 45 44 There are two 64K pages for each redistributor in the region and 46 45 redistributors are laid out contiguously within the region. Regions 47 46 are filled with redistributors in the index order. The sum of all 48 47 region count fields must be greater than or equal to the number of 49 48 VCPUs. Redistributor regions must be registered in the incremental 50 49 index order, starting from index 0. 50 + 51 51 The characteristics of a specific redistributor region can be read 52 52 by presetting the index field in the attr data. 53 53 Only valid for KVM_DEV_TYPE_ARM_VGIC_V3. ··· 60 52 KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION attributes. 61 53 62 54 Errors: 63 - -E2BIG: Address outside of addressable IPA range 64 - -EINVAL: Incorrectly aligned address, bad redistributor region 55 + 56 + ======= ============================================================= 57 + -E2BIG Address outside of addressable IPA range 58 + -EINVAL Incorrectly aligned address, bad redistributor region 65 59 count/index, mixed redistributor region attribute usage 66 - -EEXIST: Address already configured 67 - -ENOENT: Attempt to read the characteristics of a non existing 60 + -EEXIST Address already configured 61 + -ENOENT Attempt to read the characteristics of a non existing 68 62 redistributor region 69 - -ENXIO: The group or attribute is unknown/unsupported for this device 63 + -ENXIO The group or attribute is unknown/unsupported for this device 70 64 or hardware support is missing. 71 - -EFAULT: Invalid user pointer for attr->addr. 65 + -EFAULT Invalid user pointer for attr->addr. 66 + ======= ============================================================= 72 67 73 68 74 - KVM_DEV_ARM_VGIC_GRP_DIST_REGS 75 - KVM_DEV_ARM_VGIC_GRP_REDIST_REGS 76 - Attributes: 77 - The attr field of kvm_device_attr encodes two values: 78 - bits: | 63 .... 32 | 31 .... 0 | 79 - values: | mpidr | offset | 69 + KVM_DEV_ARM_VGIC_GRP_DIST_REGS, KVM_DEV_ARM_VGIC_GRP_REDIST_REGS 70 + Attributes: 71 + 72 + The attr field of kvm_device_attr encodes two values:: 73 + 74 + bits: | 63 .... 32 | 31 .... 0 | 75 + values: | mpidr | offset | 80 76 81 77 All distributor regs are (rw, 32-bit) and kvm_device_attr.addr points to a 82 78 __u32 value. 64-bit registers must be accessed by separately accessing the ··· 105 93 redistributor is accessed. The mpidr is ignored for the distributor. 106 94 107 95 The mpidr encoding is based on the affinity information in the 108 - architecture defined MPIDR, and the field is encoded as follows: 96 + architecture defined MPIDR, and the field is encoded as follows:: 97 + 109 98 | 63 .... 56 | 55 .... 48 | 47 .... 40 | 39 .... 32 | 110 99 | Aff3 | Aff2 | Aff1 | Aff0 | 111 100 ··· 161 148 ignored. 162 149 163 150 Errors: 164 - -ENXIO: Getting or setting this register is not yet supported 165 - -EBUSY: One or more VCPUs are running 151 + 152 + ====== ===================================================== 153 + -ENXIO Getting or setting this register is not yet supported 154 + -EBUSY One or more VCPUs are running 155 + ====== ===================================================== 166 156 167 157 168 158 KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 169 - Attributes: 170 - The attr field of kvm_device_attr encodes two values: 171 - bits: | 63 .... 32 | 31 .... 16 | 15 .... 0 | 172 - values: | mpidr | RES | instr | 159 + Attributes: 160 + 161 + The attr field of kvm_device_attr encodes two values:: 162 + 163 + bits: | 63 .... 32 | 31 .... 16 | 15 .... 0 | 164 + values: | mpidr | RES | instr | 173 165 174 166 The mpidr field encodes the CPU ID based on the affinity information in the 175 - architecture defined MPIDR, and the field is encoded as follows: 167 + architecture defined MPIDR, and the field is encoded as follows:: 168 + 176 169 | 63 .... 56 | 55 .... 48 | 47 .... 40 | 39 .... 32 | 177 170 | Aff3 | Aff2 | Aff1 | Aff0 | 178 171 179 172 The instr field encodes the system register to access based on the fields 180 173 defined in the A64 instruction set encoding for system register access 181 - (RES means the bits are reserved for future use and should be zero): 174 + (RES means the bits are reserved for future use and should be zero):: 182 175 183 176 | 15 ... 14 | 13 ... 11 | 10 ... 7 | 6 ... 3 | 2 ... 0 | 184 177 | Op 0 | Op1 | CRn | CRm | Op2 | ··· 197 178 198 179 CPU interface registers access is not implemented for AArch32 mode. 199 180 Error -ENXIO is returned when accessed in AArch32 mode. 181 + 200 182 Errors: 201 - -ENXIO: Getting or setting this register is not yet supported 202 - -EBUSY: VCPU is running 203 - -EINVAL: Invalid mpidr or register value supplied 183 + 184 + ======= ===================================================== 185 + -ENXIO Getting or setting this register is not yet supported 186 + -EBUSY VCPU is running 187 + -EINVAL Invalid mpidr or register value supplied 188 + ======= ===================================================== 204 189 205 190 206 191 KVM_DEV_ARM_VGIC_GRP_NR_IRQS 207 - Attributes: 192 + Attributes: 193 + 208 194 A value describing the number of interrupts (SGI, PPI and SPI) for 209 195 this GIC instance, ranging from 64 to 1024, in increments of 32. 210 196 211 197 kvm_device_attr.addr points to a __u32 value. 212 198 213 199 Errors: 214 - -EINVAL: Value set is out of the expected range 215 - -EBUSY: Value has already be set. 200 + 201 + ======= ====================================== 202 + -EINVAL Value set is out of the expected range 203 + -EBUSY Value has already be set. 204 + ======= ====================================== 216 205 217 206 218 207 KVM_DEV_ARM_VGIC_GRP_CTRL 219 - Attributes: 208 + Attributes: 209 + 220 210 KVM_DEV_ARM_VGIC_CTRL_INIT 221 211 request the initialization of the VGIC, no additional parameter in 222 212 kvm_device_attr.addr. ··· 233 205 save all LPI pending bits into guest RAM pending tables. 234 206 235 207 The first kB of the pending table is not altered by this operation. 208 + 236 209 Errors: 237 - -ENXIO: VGIC not properly configured as required prior to calling 238 - this attribute 239 - -ENODEV: no online VCPU 240 - -ENOMEM: memory shortage when allocating vgic internal data 241 - -EFAULT: Invalid guest ram access 242 - -EBUSY: One or more VCPUS are running 210 + 211 + ======= ======================================================== 212 + -ENXIO VGIC not properly configured as required prior to calling 213 + this attribute 214 + -ENODEV no online VCPU 215 + -ENOMEM memory shortage when allocating vgic internal data 216 + -EFAULT Invalid guest ram access 217 + -EBUSY One or more VCPUS are running 218 + ======= ======================================================== 243 219 244 220 245 221 KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO 246 - Attributes: 247 - The attr field of kvm_device_attr encodes the following values: 248 - bits: | 63 .... 32 | 31 .... 10 | 9 .... 0 | 249 - values: | mpidr | info | vINTID | 222 + Attributes: 223 + 224 + The attr field of kvm_device_attr encodes the following values:: 225 + 226 + bits: | 63 .... 32 | 31 .... 10 | 9 .... 0 | 227 + values: | mpidr | info | vINTID | 250 228 251 229 The vINTID specifies which set of IRQs is reported on. 252 230 ··· 262 228 VGIC_LEVEL_INFO_LINE_LEVEL: 263 229 Get/Set the input level of the IRQ line for a set of 32 contiguously 264 230 numbered interrupts. 231 + 265 232 vINTID must be a multiple of 32. 266 233 267 234 kvm_device_attr.addr points to a __u32 value which will contain a ··· 278 243 reported with the same value regardless of the mpidr specified. 279 244 280 245 The mpidr field encodes the CPU ID based on the affinity information in the 281 - architecture defined MPIDR, and the field is encoded as follows: 246 + architecture defined MPIDR, and the field is encoded as follows:: 247 + 282 248 | 63 .... 56 | 55 .... 48 | 47 .... 40 | 39 .... 32 | 283 249 | Aff3 | Aff2 | Aff1 | Aff0 | 250 + 284 251 Errors: 285 - -EINVAL: vINTID is not multiple of 32 or 286 - info field is not VGIC_LEVEL_INFO_LINE_LEVEL 252 + 253 + ======= ============================================= 254 + -EINVAL vINTID is not multiple of 32 or info field is 255 + not VGIC_LEVEL_INFO_LINE_LEVEL 256 + ======= =============================================

+59 -30

Documentation/virt/kvm/devices/arm-vgic.txt Documentation/virt/kvm/devices/arm-vgic.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================================================== 1 4 ARM Virtual Generic Interrupt Controller v2 (VGIC) 2 5 ================================================== 3 6 4 7 Device types supported: 5 - KVM_DEV_TYPE_ARM_VGIC_V2 ARM Generic Interrupt Controller v2.0 8 + 9 + - KVM_DEV_TYPE_ARM_VGIC_V2 ARM Generic Interrupt Controller v2.0 6 10 7 11 Only one VGIC instance may be instantiated through either this API or the 8 12 legacy KVM_CREATE_IRQCHIP API. The created VGIC will act as the VM interrupt ··· 21 17 22 18 Groups: 23 19 KVM_DEV_ARM_VGIC_GRP_ADDR 24 - Attributes: 20 + Attributes: 21 + 25 22 KVM_VGIC_V2_ADDR_TYPE_DIST (rw, 64-bit) 26 23 Base address in the guest physical address space of the GIC distributor 27 24 register mappings. Only valid for KVM_DEV_TYPE_ARM_VGIC_V2. ··· 32 27 Base address in the guest physical address space of the GIC virtual cpu 33 28 interface register mappings. Only valid for KVM_DEV_TYPE_ARM_VGIC_V2. 34 29 This address needs to be 4K aligned and the region covers 4 KByte. 30 + 35 31 Errors: 36 - -E2BIG: Address outside of addressable IPA range 37 - -EINVAL: Incorrectly aligned address 38 - -EEXIST: Address already configured 39 - -ENXIO: The group or attribute is unknown/unsupported for this device 32 + 33 + ======= ============================================================= 34 + -E2BIG Address outside of addressable IPA range 35 + -EINVAL Incorrectly aligned address 36 + -EEXIST Address already configured 37 + -ENXIO The group or attribute is unknown/unsupported for this device 40 38 or hardware support is missing. 41 - -EFAULT: Invalid user pointer for attr->addr. 39 + -EFAULT Invalid user pointer for attr->addr. 40 + ======= ============================================================= 42 41 43 42 KVM_DEV_ARM_VGIC_GRP_DIST_REGS 44 - Attributes: 45 - The attr field of kvm_device_attr encodes two values: 46 - bits: | 63 .... 40 | 39 .. 32 | 31 .... 0 | 47 - values: | reserved | vcpu_index | offset | 43 + Attributes: 44 + 45 + The attr field of kvm_device_attr encodes two values:: 46 + 47 + bits: | 63 .... 40 | 39 .. 32 | 31 .... 0 | 48 + values: | reserved | vcpu_index | offset | 48 49 49 50 All distributor regs are (rw, 32-bit) 50 51 ··· 69 58 KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS) to ensure 70 59 the expected behavior. Unless GICD_IIDR has been set from userspace, writes 71 60 to the interrupt group registers (GICD_IGROUPR) are ignored. 61 + 72 62 Errors: 73 - -ENXIO: Getting or setting this register is not yet supported 74 - -EBUSY: One or more VCPUs are running 75 - -EINVAL: Invalid vcpu_index supplied 63 + 64 + ======= ===================================================== 65 + -ENXIO Getting or setting this register is not yet supported 66 + -EBUSY One or more VCPUs are running 67 + -EINVAL Invalid vcpu_index supplied 68 + ======= ===================================================== 76 69 77 70 KVM_DEV_ARM_VGIC_GRP_CPU_REGS 78 - Attributes: 79 - The attr field of kvm_device_attr encodes two values: 80 - bits: | 63 .... 40 | 39 .. 32 | 31 .... 0 | 81 - values: | reserved | vcpu_index | offset | 71 + Attributes: 72 + 73 + The attr field of kvm_device_attr encodes two values:: 74 + 75 + bits: | 63 .... 40 | 39 .. 32 | 31 .... 0 | 76 + values: | reserved | vcpu_index | offset | 82 77 83 78 All CPU interface regs are (rw, 32-bit) 84 79 ··· 118 101 value left by 3 places to obtain the actual priority mask level. 119 102 120 103 Errors: 121 - -ENXIO: Getting or setting this register is not yet supported 122 - -EBUSY: One or more VCPUs are running 123 - -EINVAL: Invalid vcpu_index supplied 104 + 105 + ======= ===================================================== 106 + -ENXIO Getting or setting this register is not yet supported 107 + -EBUSY One or more VCPUs are running 108 + -EINVAL Invalid vcpu_index supplied 109 + ======= ===================================================== 124 110 125 111 KVM_DEV_ARM_VGIC_GRP_NR_IRQS 126 - Attributes: 112 + Attributes: 113 + 127 114 A value describing the number of interrupts (SGI, PPI and SPI) for 128 115 this GIC instance, ranging from 64 to 1024, in increments of 32. 129 116 130 117 Errors: 131 - -EINVAL: Value set is out of the expected range 132 - -EBUSY: Value has already be set, or GIC has already been initialized 133 - with default values. 118 + 119 + ======= ============================================================= 120 + -EINVAL Value set is out of the expected range 121 + -EBUSY Value has already be set, or GIC has already been initialized 122 + with default values. 123 + ======= ============================================================= 134 124 135 125 KVM_DEV_ARM_VGIC_GRP_CTRL 136 - Attributes: 126 + Attributes: 127 + 137 128 KVM_DEV_ARM_VGIC_CTRL_INIT 138 129 request the initialization of the VGIC or ITS, no additional parameter 139 130 in kvm_device_attr.addr. 131 + 140 132 Errors: 141 - -ENXIO: VGIC not properly configured as required prior to calling 142 - this attribute 143 - -ENODEV: no online VCPU 144 - -ENOMEM: memory shortage when allocating vgic internal data 133 + 134 + ======= ========================================================= 135 + -ENXIO VGIC not properly configured as required prior to calling 136 + this attribute 137 + -ENODEV no online VCPU 138 + -ENOMEM memory shortage when allocating vgic internal data 139 + ======= =========================================================

+19

Documentation/virt/kvm/devices/index.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ======= 4 + Devices 5 + ======= 6 + 7 + .. toctree:: 8 + :maxdepth: 2 9 + 10 + arm-vgic-its 11 + arm-vgic 12 + arm-vgic-v3 13 + mpic 14 + s390_flic 15 + vcpu 16 + vfio 17 + vm 18 + xics 19 + xive

+8 -3

Documentation/virt/kvm/devices/mpic.txt Documentation/virt/kvm/devices/mpic.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ========================= 1 4 MPIC interrupt controller 2 5 ========================= 3 6 4 7 Device types supported: 5 - KVM_DEV_TYPE_FSL_MPIC_20 Freescale MPIC v2.0 6 - KVM_DEV_TYPE_FSL_MPIC_42 Freescale MPIC v4.2 8 + 9 + - KVM_DEV_TYPE_FSL_MPIC_20 Freescale MPIC v2.0 10 + - KVM_DEV_TYPE_FSL_MPIC_42 Freescale MPIC v4.2 7 11 8 12 Only one MPIC instance, of any type, may be instantiated. The created 9 13 MPIC will act as the system interrupt controller, connecting to each ··· 15 11 16 12 Groups: 17 13 KVM_DEV_MPIC_GRP_MISC 18 - Attributes: 14 + Attributes: 15 + 19 16 KVM_DEV_MPIC_BASE_ADDR (rw, 64-bit) 20 17 Base address of the 256 KiB MPIC register space. Must be 21 18 naturally aligned. A value of zero disables the mapping.

+40 -30

Documentation/virt/kvm/devices/s390_flic.txt Documentation/virt/kvm/devices/s390_flic.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ==================================== 1 4 FLIC (floating interrupt controller) 2 5 ==================================== 3 6 ··· 34 31 Copies all floating interrupts into a buffer provided by userspace. 35 32 When the buffer is too small it returns -ENOMEM, which is the indication 36 33 for userspace to try again with a bigger buffer. 34 + 37 35 -ENOBUFS is returned when the allocation of a kernelspace buffer has 38 36 failed. 37 + 39 38 -EFAULT is returned when copying data to userspace failed. 40 39 All interrupts remain pending, i.e. are not deleted from the list of 41 40 currently pending interrupts. ··· 65 60 66 61 KVM_DEV_FLIC_ADAPTER_REGISTER 67 62 Register an I/O adapter interrupt source. Takes a kvm_s390_io_adapter 68 - describing the adapter to register: 63 + describing the adapter to register:: 69 64 70 - struct kvm_s390_io_adapter { 71 - __u32 id; 72 - __u8 isc; 73 - __u8 maskable; 74 - __u8 swap; 75 - __u8 flags; 76 - }; 65 + struct kvm_s390_io_adapter { 66 + __u32 id; 67 + __u8 isc; 68 + __u8 maskable; 69 + __u8 swap; 70 + __u8 flags; 71 + }; 77 72 78 73 id contains the unique id for the adapter, isc the I/O interruption subclass 79 74 to use, maskable whether this adapter may be masked (interrupts turned off), 80 75 swap whether the indicators need to be byte swapped, and flags contains 81 76 further characteristics of the adapter. 77 + 82 78 Currently defined values for 'flags' are: 79 + 83 80 - KVM_S390_ADAPTER_SUPPRESSIBLE: adapter is subject to AIS 84 81 (adapter-interrupt-suppression) facility. This flag only has an effect if 85 82 the AIS capability is enabled. 83 + 86 84 Unknown flag values are ignored. 87 85 88 86 89 87 KVM_DEV_FLIC_ADAPTER_MODIFY 90 88 Modifies attributes of an existing I/O adapter interrupt source. Takes 91 - a kvm_s390_io_adapter_req specifying the adapter and the operation: 89 + a kvm_s390_io_adapter_req specifying the adapter and the operation:: 92 90 93 - struct kvm_s390_io_adapter_req { 94 - __u32 id; 95 - __u8 type; 96 - __u8 mask; 97 - __u16 pad0; 98 - __u64 addr; 99 - }; 91 + struct kvm_s390_io_adapter_req { 92 + __u32 id; 93 + __u8 type; 94 + __u8 mask; 95 + __u16 pad0; 96 + __u64 addr; 97 + }; 100 98 101 99 id specifies the adapter and type the operation. The supported operations 102 100 are: ··· 111 103 perform a gmap translation for the guest address provided in addr, 112 104 pin a userspace page for the translated address and add it to the 113 105 list of mappings 114 - Note: A new mapping will be created unconditionally; therefore, 115 - the calling code should avoid making duplicate mappings. 106 + 107 + .. note:: A new mapping will be created unconditionally; therefore, 108 + the calling code should avoid making duplicate mappings. 116 109 117 110 KVM_S390_IO_ADAPTER_UNMAP 118 111 release a userspace page for the translated address specified in addr ··· 121 112 122 113 KVM_DEV_FLIC_AISM 123 114 modify the adapter-interruption-suppression mode for a given isc if the 124 - AIS capability is enabled. Takes a kvm_s390_ais_req describing: 115 + AIS capability is enabled. Takes a kvm_s390_ais_req describing:: 125 116 126 - struct kvm_s390_ais_req { 127 - __u8 isc; 128 - __u16 mode; 129 - }; 117 + struct kvm_s390_ais_req { 118 + __u8 isc; 119 + __u16 mode; 120 + }; 130 121 131 122 isc contains the target I/O interruption subclass, mode the target 132 123 adapter-interruption-suppression mode. The following modes are 133 124 currently supported: 125 + 134 126 - KVM_S390_AIS_MODE_ALL: ALL-Interruptions Mode, i.e. airq injection 135 127 is always allowed; 136 128 - KVM_S390_AIS_MODE_SINGLE: SINGLE-Interruption Mode, i.e. airq ··· 149 139 150 140 KVM_DEV_FLIC_AISM_ALL 151 141 Gets or sets the adapter-interruption-suppression mode for all ISCs. Takes 152 - a kvm_s390_ais_all describing: 142 + a kvm_s390_ais_all describing:: 153 143 154 - struct kvm_s390_ais_all { 155 - __u8 simm; /* Single-Interruption-Mode mask */ 156 - __u8 nimm; /* No-Interruption-Mode mask * 157 - }; 144 + struct kvm_s390_ais_all { 145 + __u8 simm; /* Single-Interruption-Mode mask */ 146 + __u8 nimm; /* No-Interruption-Mode mask * 147 + }; 158 148 159 149 simm contains Single-Interruption-Mode mask for all ISCs, nimm contains 160 150 No-Interruption-Mode mask for all ISCs. Each bit in simm and nimm corresponds ··· 169 159 that a FLIC operation is unavailable based on the error code resulting from a 170 160 usage attempt. 171 161 172 - Note: The KVM_DEV_FLIC_CLEAR_IO_IRQ ioctl will return EINVAL in case a zero 173 - schid is specified. 162 + .. note:: The KVM_DEV_FLIC_CLEAR_IO_IRQ ioctl will return EINVAL in case a 163 + zero schid is specified.

+114

Documentation/virt/kvm/devices/vcpu.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ====================== 4 + Generic vcpu interface 5 + ====================== 6 + 7 + The virtual cpu "device" also accepts the ioctls KVM_SET_DEVICE_ATTR, 8 + KVM_GET_DEVICE_ATTR, and KVM_HAS_DEVICE_ATTR. The interface uses the same struct 9 + kvm_device_attr as other devices, but targets VCPU-wide settings and controls. 10 + 11 + The groups and attributes per virtual cpu, if any, are architecture specific. 12 + 13 + 1. GROUP: KVM_ARM_VCPU_PMU_V3_CTRL 14 + ================================== 15 + 16 + :Architectures: ARM64 17 + 18 + 1.1. ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_IRQ 19 + --------------------------------------- 20 + 21 + :Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a 22 + pointer to an int 23 + 24 + Returns: 25 + 26 + ======= ======================================================== 27 + -EBUSY The PMU overflow interrupt is already set 28 + -ENXIO The overflow interrupt not set when attempting to get it 29 + -ENODEV PMUv3 not supported 30 + -EINVAL Invalid PMU overflow interrupt number supplied or 31 + trying to set the IRQ number without using an in-kernel 32 + irqchip. 33 + ======= ======================================================== 34 + 35 + A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt 36 + number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt 37 + type must be same for each vcpu. As a PPI, the interrupt number is the same for 38 + all vcpus, while as an SPI it must be a separate number per vcpu. 39 + 40 + 1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT 41 + --------------------------------------- 42 + 43 + :Parameters: no additional parameter in kvm_device_attr.addr 44 + 45 + Returns: 46 + 47 + ======= ====================================================== 48 + -ENODEV PMUv3 not supported or GIC not initialized 49 + -ENXIO PMUv3 not properly configured or in-kernel irqchip not 50 + configured as required prior to calling this attribute 51 + -EBUSY PMUv3 already initialized 52 + ======= ====================================================== 53 + 54 + Request the initialization of the PMUv3. If using the PMUv3 with an in-kernel 55 + virtual GIC implementation, this must be done after initializing the in-kernel 56 + irqchip. 57 + 58 + 59 + 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL 60 + ================================= 61 + 62 + :Architectures: ARM, ARM64 63 + 64 + 2.1. ATTRIBUTES: KVM_ARM_VCPU_TIMER_IRQ_VTIMER, KVM_ARM_VCPU_TIMER_IRQ_PTIMER 65 + ----------------------------------------------------------------------------- 66 + 67 + :Parameters: in kvm_device_attr.addr the address for the timer interrupt is a 68 + pointer to an int 69 + 70 + Returns: 71 + 72 + ======= ================================= 73 + -EINVAL Invalid timer interrupt number 74 + -EBUSY One or more VCPUs has already run 75 + ======= ================================= 76 + 77 + A value describing the architected timer interrupt number when connected to an 78 + in-kernel virtual GIC. These must be a PPI (16 <= intid < 32). Setting the 79 + attribute overrides the default values (see below). 80 + 81 + ============================= ========================================== 82 + KVM_ARM_VCPU_TIMER_IRQ_VTIMER The EL1 virtual timer intid (default: 27) 83 + KVM_ARM_VCPU_TIMER_IRQ_PTIMER The EL1 physical timer intid (default: 30) 84 + ============================= ========================================== 85 + 86 + Setting the same PPI for different timers will prevent the VCPUs from running. 87 + Setting the interrupt number on a VCPU configures all VCPUs created at that 88 + time to use the number provided for a given timer, overwriting any previously 89 + configured values on other VCPUs. Userspace should configure the interrupt 90 + numbers on at least one VCPU after creating all VCPUs and before running any 91 + VCPUs. 92 + 93 + 3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL 94 + ================================== 95 + 96 + :Architectures: ARM64 97 + 98 + 3.1 ATTRIBUTE: KVM_ARM_VCPU_PVTIME_IPA 99 + -------------------------------------- 100 + 101 + :Parameters: 64-bit base address 102 + 103 + Returns: 104 + 105 + ======= ====================================== 106 + -ENXIO Stolen time not implemented 107 + -EEXIST Base address already set for this VCPU 108 + -EINVAL Base address not 64 byte aligned 109 + ======= ====================================== 110 + 111 + Specifies the base address of the stolen time structure for this VCPU. The 112 + base address must be 64 byte aligned and exist within a valid guest memory 113 + region. See Documentation/virt/kvm/arm/pvtime.txt for more information 114 + including the layout of the stolen time structure.

-76

Documentation/virt/kvm/devices/vcpu.txt

··· 1 - Generic vcpu interface 2 - ==================================== 3 - 4 - The virtual cpu "device" also accepts the ioctls KVM_SET_DEVICE_ATTR, 5 - KVM_GET_DEVICE_ATTR, and KVM_HAS_DEVICE_ATTR. The interface uses the same struct 6 - kvm_device_attr as other devices, but targets VCPU-wide settings and controls. 7 - 8 - The groups and attributes per virtual cpu, if any, are architecture specific. 9 - 10 - 1. GROUP: KVM_ARM_VCPU_PMU_V3_CTRL 11 - Architectures: ARM64 12 - 13 - 1.1. ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_IRQ 14 - Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a 15 - pointer to an int 16 - Returns: -EBUSY: The PMU overflow interrupt is already set 17 - -ENXIO: The overflow interrupt not set when attempting to get it 18 - -ENODEV: PMUv3 not supported 19 - -EINVAL: Invalid PMU overflow interrupt number supplied or 20 - trying to set the IRQ number without using an in-kernel 21 - irqchip. 22 - 23 - A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt 24 - number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt 25 - type must be same for each vcpu. As a PPI, the interrupt number is the same for 26 - all vcpus, while as an SPI it must be a separate number per vcpu. 27 - 28 - 1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT 29 - Parameters: no additional parameter in kvm_device_attr.addr 30 - Returns: -ENODEV: PMUv3 not supported or GIC not initialized 31 - -ENXIO: PMUv3 not properly configured or in-kernel irqchip not 32 - configured as required prior to calling this attribute 33 - -EBUSY: PMUv3 already initialized 34 - 35 - Request the initialization of the PMUv3. If using the PMUv3 with an in-kernel 36 - virtual GIC implementation, this must be done after initializing the in-kernel 37 - irqchip. 38 - 39 - 40 - 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL 41 - Architectures: ARM,ARM64 42 - 43 - 2.1. ATTRIBUTE: KVM_ARM_VCPU_TIMER_IRQ_VTIMER 44 - 2.2. ATTRIBUTE: KVM_ARM_VCPU_TIMER_IRQ_PTIMER 45 - Parameters: in kvm_device_attr.addr the address for the timer interrupt is a 46 - pointer to an int 47 - Returns: -EINVAL: Invalid timer interrupt number 48 - -EBUSY: One or more VCPUs has already run 49 - 50 - A value describing the architected timer interrupt number when connected to an 51 - in-kernel virtual GIC. These must be a PPI (16 <= intid < 32). Setting the 52 - attribute overrides the default values (see below). 53 - 54 - KVM_ARM_VCPU_TIMER_IRQ_VTIMER: The EL1 virtual timer intid (default: 27) 55 - KVM_ARM_VCPU_TIMER_IRQ_PTIMER: The EL1 physical timer intid (default: 30) 56 - 57 - Setting the same PPI for different timers will prevent the VCPUs from running. 58 - Setting the interrupt number on a VCPU configures all VCPUs created at that 59 - time to use the number provided for a given timer, overwriting any previously 60 - configured values on other VCPUs. Userspace should configure the interrupt 61 - numbers on at least one VCPU after creating all VCPUs and before running any 62 - VCPUs. 63 - 64 - 3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL 65 - Architectures: ARM64 66 - 67 - 3.1 ATTRIBUTE: KVM_ARM_VCPU_PVTIME_IPA 68 - Parameters: 64-bit base address 69 - Returns: -ENXIO: Stolen time not implemented 70 - -EEXIST: Base address already set for this VCPU 71 - -EINVAL: Base address not 64 byte aligned 72 - 73 - Specifies the base address of the stolen time structure for this VCPU. The 74 - base address must be 64 byte aligned and exist within a valid guest memory 75 - region. See Documentation/virt/kvm/arm/pvtime.txt for more information 76 - including the layout of the stolen time structure.

+15 -10

Documentation/virt/kvm/devices/vfio.txt Documentation/virt/kvm/devices/vfio.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =================== 1 4 VFIO virtual device 2 5 =================== 3 6 4 7 Device types supported: 5 - KVM_DEV_TYPE_VFIO 8 + 9 + - KVM_DEV_TYPE_VFIO 6 10 7 11 Only one VFIO instance may be created per VM. The created device 8 12 tracks VFIO groups in use by the VM and features of those groups ··· 27 23 for the VFIO group. 28 24 KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table 29 25 allocated by sPAPR KVM. 30 - kvm_device_attr.addr points to a struct: 26 + kvm_device_attr.addr points to a struct:: 31 27 32 - struct kvm_vfio_spapr_tce { 33 - __s32 groupfd; 34 - __s32 tablefd; 35 - }; 28 + struct kvm_vfio_spapr_tce { 29 + __s32 groupfd; 30 + __s32 tablefd; 31 + }; 36 32 37 - where 38 - @groupfd is a file descriptor for a VFIO group; 39 - @tablefd is a file descriptor for a TCE table allocated via 40 - KVM_CREATE_SPAPR_TCE. 33 + where: 34 + 35 + - @groupfd is a file descriptor for a VFIO group; 36 + - @tablefd is a file descriptor for a TCE table allocated via 37 + KVM_CREATE_SPAPR_TCE.

+126 -80

Documentation/virt/kvm/devices/vm.txt Documentation/virt/kvm/devices/vm.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ==================== 1 4 Generic vm interface 2 - ==================================== 5 + ==================== 3 6 4 7 The virtual machine "device" also accepts the ioctls KVM_SET_DEVICE_ATTR, 5 8 KVM_GET_DEVICE_ATTR, and KVM_HAS_DEVICE_ATTR. The interface uses the same ··· 13 10 specific. 14 11 15 12 1. GROUP: KVM_S390_VM_MEM_CTRL 16 - Architectures: s390 13 + ============================== 14 + 15 + :Architectures: s390 17 16 18 17 1.1. ATTRIBUTE: KVM_S390_VM_MEM_ENABLE_CMMA 19 - Parameters: none 20 - Returns: -EBUSY if a vcpu is already defined, otherwise 0 18 + ------------------------------------------- 19 + 20 + :Parameters: none 21 + :Returns: -EBUSY if a vcpu is already defined, otherwise 0 21 22 22 23 Enables Collaborative Memory Management Assist (CMMA) for the virtual machine. 23 24 24 25 1.2. ATTRIBUTE: KVM_S390_VM_MEM_CLR_CMMA 25 - Parameters: none 26 - Returns: -EINVAL if CMMA was not enabled 27 - 0 otherwise 26 + ---------------------------------------- 27 + 28 + :Parameters: none 29 + :Returns: -EINVAL if CMMA was not enabled; 30 + 0 otherwise 28 31 29 32 Clear the CMMA status for all guest pages, so any pages the guest marked 30 33 as unused are again used any may not be reclaimed by the host. 31 34 32 35 1.3. ATTRIBUTE KVM_S390_VM_MEM_LIMIT_SIZE 33 - Parameters: in attr->addr the address for the new limit of guest memory 34 - Returns: -EFAULT if the given address is not accessible 35 - -EINVAL if the virtual machine is of type UCONTROL 36 - -E2BIG if the given guest memory is to big for that machine 37 - -EBUSY if a vcpu is already defined 38 - -ENOMEM if not enough memory is available for a new shadow guest mapping 39 - 0 otherwise 36 + ----------------------------------------- 37 + 38 + :Parameters: in attr->addr the address for the new limit of guest memory 39 + :Returns: -EFAULT if the given address is not accessible; 40 + -EINVAL if the virtual machine is of type UCONTROL; 41 + -E2BIG if the given guest memory is to big for that machine; 42 + -EBUSY if a vcpu is already defined; 43 + -ENOMEM if not enough memory is available for a new shadow guest mapping; 44 + 0 otherwise. 40 45 41 46 Allows userspace to query the actual limit and set a new limit for 42 47 the maximum guest memory size. The limit will be rounded up to ··· 53 42 the limit to KVM_S390_NO_MEM_LIMIT (U64_MAX). 54 43 55 44 2. GROUP: KVM_S390_VM_CPU_MODEL 56 - Architectures: s390 45 + =============================== 46 + 47 + :Architectures: s390 57 48 58 49 2.1. ATTRIBUTE: KVM_S390_VM_CPU_MACHINE (r/o) 50 + --------------------------------------------- 59 51 60 - Allows user space to retrieve machine and kvm specific cpu related information: 52 + Allows user space to retrieve machine and kvm specific cpu related information:: 61 53 62 - struct kvm_s390_vm_cpu_machine { 54 + struct kvm_s390_vm_cpu_machine { 63 55 __u64 cpuid; # CPUID of host 64 56 __u32 ibc; # IBC level range offered by host 65 57 __u8 pad[4]; 66 58 __u64 fac_mask[256]; # set of cpu facilities enabled by KVM 67 59 __u64 fac_list[256]; # set of cpu facilities offered by host 68 - } 60 + } 69 61 70 - Parameters: address of buffer to store the machine related cpu data 71 - of type struct kvm_s390_vm_cpu_machine* 72 - Returns: -EFAULT if the given address is not accessible from kernel space 73 - -ENOMEM if not enough memory is available to process the ioctl 74 - 0 in case of success 62 + :Parameters: address of buffer to store the machine related cpu data 63 + of type struct kvm_s390_vm_cpu_machine* 64 + :Returns: -EFAULT if the given address is not accessible from kernel space; 65 + -ENOMEM if not enough memory is available to process the ioctl; 66 + 0 in case of success. 75 67 76 68 2.2. ATTRIBUTE: KVM_S390_VM_CPU_PROCESSOR (r/w) 69 + =============================================== 77 70 78 - Allows user space to retrieve or request to change cpu related information for a vcpu: 71 + Allows user space to retrieve or request to change cpu related information for a vcpu:: 79 72 80 - struct kvm_s390_vm_cpu_processor { 73 + struct kvm_s390_vm_cpu_processor { 81 74 __u64 cpuid; # CPUID currently (to be) used by this vcpu 82 75 __u16 ibc; # IBC level currently (to be) used by this vcpu 83 76 __u8 pad[6]; 84 77 __u64 fac_list[256]; # set of cpu facilities currently (to be) used 85 - # by this vcpu 86 - } 78 + # by this vcpu 79 + } 87 80 88 81 KVM does not enforce or limit the cpu model data in any form. Take the information 89 82 retrieved by means of KVM_S390_VM_CPU_MACHINE as hint for reasonable configuration 90 83 setups. Instruction interceptions triggered by additionally set facility bits that 91 84 are not handled by KVM need to by imlemented in the VM driver code. 92 85 93 - Parameters: address of buffer to store/set the processor related cpu 94 - data of type struct kvm_s390_vm_cpu_processor*. 95 - Returns: -EBUSY in case 1 or more vcpus are already activated (only in write case) 96 - -EFAULT if the given address is not accessible from kernel space 97 - -ENOMEM if not enough memory is available to process the ioctl 98 - 0 in case of success 86 + :Parameters: address of buffer to store/set the processor related cpu 87 + data of type struct kvm_s390_vm_cpu_processor*. 88 + :Returns: -EBUSY in case 1 or more vcpus are already activated (only in write case); 89 + -EFAULT if the given address is not accessible from kernel space; 90 + -ENOMEM if not enough memory is available to process the ioctl; 91 + 0 in case of success. 92 + 93 + .. _KVM_S390_VM_CPU_MACHINE_FEAT: 99 94 100 95 2.3. ATTRIBUTE: KVM_S390_VM_CPU_MACHINE_FEAT (r/o) 96 + -------------------------------------------------- 101 97 102 98 Allows user space to retrieve available cpu features. A feature is available if 103 99 provided by the hardware and supported by kvm. In theory, cpu features could 104 100 even be completely emulated by kvm. 105 101 106 - struct kvm_s390_vm_cpu_feat { 107 - __u64 feat[16]; # Bitmap (1 = feature available), MSB 0 bit numbering 108 - }; 102 + :: 109 103 110 - Parameters: address of a buffer to load the feature list from. 111 - Returns: -EFAULT if the given address is not accessible from kernel space. 112 - 0 in case of success. 104 + struct kvm_s390_vm_cpu_feat { 105 + __u64 feat[16]; # Bitmap (1 = feature available), MSB 0 bit numbering 106 + }; 107 + 108 + :Parameters: address of a buffer to load the feature list from. 109 + :Returns: -EFAULT if the given address is not accessible from kernel space; 110 + 0 in case of success. 113 111 114 112 2.4. ATTRIBUTE: KVM_S390_VM_CPU_PROCESSOR_FEAT (r/w) 113 + ---------------------------------------------------- 115 114 116 115 Allows user space to retrieve or change enabled cpu features for all VCPUs of a 117 116 VM. Features that are not available cannot be enabled. 118 117 119 - See 2.3. for a description of the parameter struct. 118 + See :ref:`KVM_S390_VM_CPU_MACHINE_FEAT` for 119 + a description of the parameter struct. 120 120 121 - Parameters: address of a buffer to store/load the feature list from. 122 - Returns: -EFAULT if the given address is not accessible from kernel space. 123 - -EINVAL if a cpu feature that is not available is to be enabled. 124 - -EBUSY if at least one VCPU has already been defined. 121 + :Parameters: address of a buffer to store/load the feature list from. 122 + :Returns: -EFAULT if the given address is not accessible from kernel space; 123 + -EINVAL if a cpu feature that is not available is to be enabled; 124 + -EBUSY if at least one VCPU has already been defined; 125 125 0 in case of success. 126 126 127 + .. _KVM_S390_VM_CPU_MACHINE_SUBFUNC: 128 + 127 129 2.5. ATTRIBUTE: KVM_S390_VM_CPU_MACHINE_SUBFUNC (r/o) 130 + ----------------------------------------------------- 128 131 129 132 Allows user space to retrieve available cpu subfunctions without any filtering 130 133 done by a set IBC. These subfunctions are indicated to the guest VCPU via ··· 151 126 indicates subfunctions via a "test bit" mechanism, the subfunction codes are 152 127 contained in the returned struct in MSB 0 bit numbering. 153 128 154 - struct kvm_s390_vm_cpu_subfunc { 129 + :: 130 + 131 + struct kvm_s390_vm_cpu_subfunc { 155 132 u8 plo[32]; # always valid (ESA/390 feature) 156 133 u8 ptff[16]; # valid with TOD-clock steering 157 134 u8 kmac[16]; # valid with Message-Security-Assist ··· 170 143 u8 kma[16]; # valid with Message-Security-Assist-Extension 8 171 144 u8 kdsa[16]; # valid with Message-Security-Assist-Extension 9 172 145 u8 reserved[1792]; # reserved for future instructions 173 - }; 146 + }; 174 147 175 - Parameters: address of a buffer to load the subfunction blocks from. 176 - Returns: -EFAULT if the given address is not accessible from kernel space. 148 + :Parameters: address of a buffer to load the subfunction blocks from. 149 + :Returns: -EFAULT if the given address is not accessible from kernel space; 177 150 0 in case of success. 178 151 179 152 2.6. ATTRIBUTE: KVM_S390_VM_CPU_PROCESSOR_SUBFUNC (r/w) 153 + ------------------------------------------------------- 180 154 181 155 Allows user space to retrieve or change cpu subfunctions to be indicated for 182 156 all VCPUs of a VM. This attribute will only be available if kernel and ··· 192 164 to determine available subfunctions in this case, this will guarantee backward 193 165 compatibility. 194 166 195 - See 2.5. for a description of the parameter struct. 167 + See :ref:`KVM_S390_VM_CPU_MACHINE_SUBFUNC` for a 168 + description of the parameter struct. 196 169 197 - Parameters: address of a buffer to store/load the subfunction blocks from. 198 - Returns: -EFAULT if the given address is not accessible from kernel space. 199 - -EINVAL when reading, if there was no write yet. 200 - -EBUSY if at least one VCPU has already been defined. 170 + :Parameters: address of a buffer to store/load the subfunction blocks from. 171 + :Returns: -EFAULT if the given address is not accessible from kernel space; 172 + -EINVAL when reading, if there was no write yet; 173 + -EBUSY if at least one VCPU has already been defined; 201 174 0 in case of success. 202 175 203 176 3. GROUP: KVM_S390_VM_TOD 204 - Architectures: s390 177 + ========================= 178 + 179 + :Architectures: s390 205 180 206 181 3.1. ATTRIBUTE: KVM_S390_VM_TOD_HIGH 182 + ------------------------------------ 207 183 208 184 Allows user space to set/get the TOD clock extension (u8) (superseded by 209 185 KVM_S390_VM_TOD_EXT). 210 186 211 - Parameters: address of a buffer in user space to store the data (u8) to 212 - Returns: -EFAULT if the given address is not accessible from kernel space 187 + :Parameters: address of a buffer in user space to store the data (u8) to 188 + :Returns: -EFAULT if the given address is not accessible from kernel space; 213 189 -EINVAL if setting the TOD clock extension to != 0 is not supported 214 190 215 191 3.2. ATTRIBUTE: KVM_S390_VM_TOD_LOW 192 + ----------------------------------- 216 193 217 194 Allows user space to set/get bits 0-63 of the TOD clock register as defined in 218 195 the POP (u64). 219 196 220 - Parameters: address of a buffer in user space to store the data (u64) to 221 - Returns: -EFAULT if the given address is not accessible from kernel space 197 + :Parameters: address of a buffer in user space to store the data (u64) to 198 + :Returns: -EFAULT if the given address is not accessible from kernel space 222 199 223 200 3.3. ATTRIBUTE: KVM_S390_VM_TOD_EXT 201 + ----------------------------------- 202 + 224 203 Allows user space to set/get bits 0-63 of the TOD clock register as defined in 225 204 the POP (u64). If the guest CPU model supports the TOD clock extension (u8), it 226 205 also allows user space to get/set it. If the guest CPU model does not support 227 206 it, it is stored as 0 and not allowed to be set to a value != 0. 228 207 229 - Parameters: address of a buffer in user space to store the data 230 - (kvm_s390_vm_tod_clock) to 231 - Returns: -EFAULT if the given address is not accessible from kernel space 208 + :Parameters: address of a buffer in user space to store the data 209 + (kvm_s390_vm_tod_clock) to 210 + :Returns: -EFAULT if the given address is not accessible from kernel space; 232 211 -EINVAL if setting the TOD clock extension to != 0 is not supported 233 212 234 213 4. GROUP: KVM_S390_VM_CRYPTO 235 - Architectures: s390 214 + ============================ 215 + 216 + :Architectures: s390 236 217 237 218 4.1. ATTRIBUTE: KVM_S390_VM_CRYPTO_ENABLE_AES_KW (w/o) 219 + ------------------------------------------------------ 238 220 239 221 Allows user space to enable aes key wrapping, including generating a new 240 222 wrapping key. 241 223 242 - Parameters: none 243 - Returns: 0 224 + :Parameters: none 225 + :Returns: 0 244 226 245 227 4.2. ATTRIBUTE: KVM_S390_VM_CRYPTO_ENABLE_DEA_KW (w/o) 228 + ------------------------------------------------------ 246 229 247 230 Allows user space to enable dea key wrapping, including generating a new 248 231 wrapping key. 249 232 250 - Parameters: none 251 - Returns: 0 233 + :Parameters: none 234 + :Returns: 0 252 235 253 236 4.3. ATTRIBUTE: KVM_S390_VM_CRYPTO_DISABLE_AES_KW (w/o) 237 + ------------------------------------------------------- 254 238 255 239 Allows user space to disable aes key wrapping, clearing the wrapping key. 256 240 257 - Parameters: none 258 - Returns: 0 241 + :Parameters: none 242 + :Returns: 0 259 243 260 244 4.4. ATTRIBUTE: KVM_S390_VM_CRYPTO_DISABLE_DEA_KW (w/o) 245 + ------------------------------------------------------- 261 246 262 247 Allows user space to disable dea key wrapping, clearing the wrapping key. 263 248 264 - Parameters: none 265 - Returns: 0 249 + :Parameters: none 250 + :Returns: 0 266 251 267 252 5. GROUP: KVM_S390_VM_MIGRATION 268 - Architectures: s390 253 + =============================== 254 + 255 + :Architectures: s390 269 256 270 257 5.1. ATTRIBUTE: KVM_S390_VM_MIGRATION_STOP (w/o) 258 + ------------------------------------------------ 271 259 272 260 Allows userspace to stop migration mode, needed for PGSTE migration. 273 261 Setting this attribute when migration mode is not active will have no 274 262 effects. 275 263 276 - Parameters: none 277 - Returns: 0 264 + :Parameters: none 265 + :Returns: 0 278 266 279 267 5.2. ATTRIBUTE: KVM_S390_VM_MIGRATION_START (w/o) 268 + ------------------------------------------------- 280 269 281 270 Allows userspace to start migration mode, needed for PGSTE migration. 282 271 Setting this attribute when migration mode is already active will have 283 272 no effects. 284 273 285 - Parameters: none 286 - Returns: -ENOMEM if there is not enough free memory to start migration mode 287 - -EINVAL if the state of the VM is invalid (e.g. no memory defined) 274 + :Parameters: none 275 + :Returns: -ENOMEM if there is not enough free memory to start migration mode; 276 + -EINVAL if the state of the VM is invalid (e.g. no memory defined); 288 277 0 in case of success. 289 278 290 279 5.3. ATTRIBUTE: KVM_S390_VM_MIGRATION_STATUS (r/o) 280 + -------------------------------------------------- 291 281 292 282 Allows userspace to query the status of migration mode. 293 283 294 - Parameters: address of a buffer in user space to store the data (u64) to; 295 - the data itself is either 0 if migration mode is disabled or 1 296 - if it is enabled 297 - Returns: -EFAULT if the given address is not accessible from kernel space 284 + :Parameters: address of a buffer in user space to store the data (u64) to; 285 + the data itself is either 0 if migration mode is disabled or 1 286 + if it is enabled 287 + :Returns: -EFAULT if the given address is not accessible from kernel space; 298 288 0 in case of success.

+22 -6

Documentation/virt/kvm/devices/xics.txt Documentation/virt/kvm/devices/xics.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ========================= 1 4 XICS interrupt controller 5 + ========================= 2 6 3 7 Device type supported: KVM_DEV_TYPE_XICS 4 8 5 9 Groups: 6 10 1. KVM_DEV_XICS_GRP_SOURCES 7 - Attributes: One per interrupt source, indexed by the source number. 11 + Attributes: 8 12 13 + One per interrupt source, indexed by the source number. 9 14 2. KVM_DEV_XICS_GRP_CTRL 10 - Attributes: 11 - 2.1 KVM_DEV_XICS_NR_SERVERS (write only) 15 + Attributes: 16 + 17 + 2.1 KVM_DEV_XICS_NR_SERVERS (write only) 18 + 12 19 The kvm_device_attr.addr points to a __u32 value which is the number of 13 20 interrupt server numbers (ie, highest possible vcpu id plus one). 21 + 14 22 Errors: 15 - -EINVAL: Value greater than KVM_MAX_VCPU_ID. 16 - -EFAULT: Invalid user pointer for attr->addr. 17 - -EBUSY: A vcpu is already connected to the device. 23 + 24 + ======= ========================================== 25 + -EINVAL Value greater than KVM_MAX_VCPU_ID. 26 + -EFAULT Invalid user pointer for attr->addr. 27 + -EBUSY A vcpu is already connected to the device. 28 + ======= ========================================== 18 29 19 30 This device emulates the XICS (eXternal Interrupt Controller 20 31 Specification) defined in PAPR. The XICS has a set of interrupt ··· 64 53 bitfields, starting from the least-significant end of the word: 65 54 66 55 * Destination (server number), 32 bits 56 + 67 57 This specifies where the interrupt should be sent, and is the 68 58 interrupt server number specified for the destination vcpu. 69 59 70 60 * Priority, 8 bits 61 + 71 62 This is the priority specified for this interrupt source, where 0 is 72 63 the highest priority and 255 is the lowest. An interrupt with a 73 64 priority of 255 will never be delivered. 74 65 75 66 * Level sensitive flag, 1 bit 67 + 76 68 This bit is 1 for a level-sensitive interrupt source, or 0 for 77 69 edge-sensitive (or MSI). 78 70 79 71 * Masked flag, 1 bit 72 + 80 73 This bit is set to 1 if the interrupt is masked (cannot be delivered 81 74 regardless of its priority), for example by the ibm,int-off RTAS 82 75 call, or 0 if it is not masked. 83 76 84 77 * Pending flag, 1 bit 78 + 85 79 This bit is 1 if the source has a pending interrupt, otherwise 0. 86 80 87 81 Only one XICS instance may be created per VM.

+99 -57

Documentation/virt/kvm/devices/xive.txt Documentation/virt/kvm/devices/xive.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =========================================================== 1 4 POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1) 2 - ========================================================== 5 + =========================================================== 3 6 4 7 Device types supported: 5 - KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1 8 + - KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1 6 9 7 10 This device acts as a VM interrupt controller. It provides the KVM 8 11 interface to configure the interrupt sources of a VM in the underlying ··· 67 64 68 65 * Groups: 69 66 70 - 1. KVM_DEV_XIVE_GRP_CTRL 71 - Provides global controls on the device 67 + 1. KVM_DEV_XIVE_GRP_CTRL 68 + Provides global controls on the device 69 + 72 70 Attributes: 73 71 1.1 KVM_DEV_XIVE_RESET (write only) 74 72 Resets the interrupt controller configuration for sources and event 75 73 queues. To be used by kexec and kdump. 74 + 76 75 Errors: none 77 76 78 77 1.2 KVM_DEV_XIVE_EQ_SYNC (write only) 79 78 Sync all the sources and queues and mark the EQ pages dirty. This 80 79 to make sure that a consistent memory state is captured when 81 80 migrating the VM. 81 + 82 82 Errors: none 83 83 84 84 1.3 KVM_DEV_XIVE_NR_SERVERS (write only) 85 85 The kvm_device_attr.addr points to a __u32 value which is the number of 86 86 interrupt server numbers (ie, highest possible vcpu id plus one). 87 - Errors: 88 - -EINVAL: Value greater than KVM_MAX_VCPU_ID. 89 - -EFAULT: Invalid user pointer for attr->addr. 90 - -EBUSY: A vCPU is already connected to the device. 91 87 92 - 2. KVM_DEV_XIVE_GRP_SOURCE (write only) 93 - Initializes a new source in the XIVE device and mask it. 88 + Errors: 89 + 90 + ======= ========================================== 91 + -EINVAL Value greater than KVM_MAX_VCPU_ID. 92 + -EFAULT Invalid user pointer for attr->addr. 93 + -EBUSY A vCPU is already connected to the device. 94 + ======= ========================================== 95 + 96 + 2. KVM_DEV_XIVE_GRP_SOURCE (write only) 97 + Initializes a new source in the XIVE device and mask it. 98 + 94 99 Attributes: 95 100 Interrupt source number (64-bit) 96 - The kvm_device_attr.addr points to a __u64 value: 97 - bits: | 63 .... 2 | 1 | 0 98 - values: | unused | level | type 101 + 102 + The kvm_device_attr.addr points to a __u64 value:: 103 + 104 + bits: | 63 .... 2 | 1 | 0 105 + values: | unused | level | type 106 + 99 107 - type: 0:MSI 1:LSI 100 108 - level: assertion level in case of an LSI. 101 - Errors: 102 - -E2BIG: Interrupt source number is out of range 103 - -ENOMEM: Could not create a new source block 104 - -EFAULT: Invalid user pointer for attr->addr. 105 - -ENXIO: Could not allocate underlying HW interrupt 106 109 107 - 3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) 108 - Configures source targeting 110 + Errors: 111 + 112 + ======= ========================================== 113 + -E2BIG Interrupt source number is out of range 114 + -ENOMEM Could not create a new source block 115 + -EFAULT Invalid user pointer for attr->addr. 116 + -ENXIO Could not allocate underlying HW interrupt 117 + ======= ========================================== 118 + 119 + 3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) 120 + Configures source targeting 121 + 109 122 Attributes: 110 123 Interrupt source number (64-bit) 111 - The kvm_device_attr.addr points to a __u64 value: 112 - bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0 113 - values: | eisn | mask | server | priority 124 + 125 + The kvm_device_attr.addr points to a __u64 value:: 126 + 127 + bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0 128 + values: | eisn | mask | server | priority 129 + 114 130 - priority: 0-7 interrupt priority level 115 131 - server: CPU number chosen to handle the interrupt 116 132 - mask: mask flag (unused) 117 133 - eisn: Effective Interrupt Source Number 118 - Errors: 119 - -ENOENT: Unknown source number 120 - -EINVAL: Not initialized source number 121 - -EINVAL: Invalid priority 122 - -EINVAL: Invalid CPU number. 123 - -EFAULT: Invalid user pointer for attr->addr. 124 - -ENXIO: CPU event queues not configured or configuration of the 125 - underlying HW interrupt failed 126 - -EBUSY: No CPU available to serve interrupt 127 134 128 - 4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) 129 - Configures an event queue of a CPU 135 + Errors: 136 + 137 + ======= ======================================================= 138 + -ENOENT Unknown source number 139 + -EINVAL Not initialized source number 140 + -EINVAL Invalid priority 141 + -EINVAL Invalid CPU number. 142 + -EFAULT Invalid user pointer for attr->addr. 143 + -ENXIO CPU event queues not configured or configuration of the 144 + underlying HW interrupt failed 145 + -EBUSY No CPU available to serve interrupt 146 + ======= ======================================================= 147 + 148 + 4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) 149 + Configures an event queue of a CPU 150 + 130 151 Attributes: 131 152 EQ descriptor identifier (64-bit) 132 - The EQ descriptor identifier is a tuple (server, priority) : 133 - bits: | 63 .... 32 | 31 .. 3 | 2 .. 0 134 - values: | unused | server | priority 135 - The kvm_device_attr.addr points to : 153 + 154 + The EQ descriptor identifier is a tuple (server, priority):: 155 + 156 + bits: | 63 .... 32 | 31 .. 3 | 2 .. 0 157 + values: | unused | server | priority 158 + 159 + The kvm_device_attr.addr points to:: 160 + 136 161 struct kvm_ppc_xive_eq { 137 162 __u32 flags; 138 163 __u32 qshift; ··· 169 138 __u32 qindex; 170 139 __u8 pad[40]; 171 140 }; 141 + 172 142 - flags: queue flags 173 - KVM_XIVE_EQ_ALWAYS_NOTIFY (required) 143 + KVM_XIVE_EQ_ALWAYS_NOTIFY (required) 174 144 forces notification without using the coalescing mechanism 175 145 provided by the XIVE END ESBs. 176 146 - qshift: queue size (power of 2) ··· 179 147 - qtoggle: current queue toggle bit 180 148 - qindex: current queue index 181 149 - pad: reserved for future use 182 - Errors: 183 - -ENOENT: Invalid CPU number 184 - -EINVAL: Invalid priority 185 - -EINVAL: Invalid flags 186 - -EINVAL: Invalid queue size 187 - -EINVAL: Invalid queue address 188 - -EFAULT: Invalid user pointer for attr->addr. 189 - -EIO: Configuration of the underlying HW failed 190 150 191 - 5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) 192 - Synchronize the source to flush event notifications 151 + Errors: 152 + 153 + ======= ========================================= 154 + -ENOENT Invalid CPU number 155 + -EINVAL Invalid priority 156 + -EINVAL Invalid flags 157 + -EINVAL Invalid queue size 158 + -EINVAL Invalid queue address 159 + -EFAULT Invalid user pointer for attr->addr. 160 + -EIO Configuration of the underlying HW failed 161 + ======= ========================================= 162 + 163 + 5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) 164 + Synchronize the source to flush event notifications 165 + 193 166 Attributes: 194 167 Interrupt source number (64-bit) 168 + 195 169 Errors: 196 - -ENOENT: Unknown source number 197 - -EINVAL: Not initialized source number 170 + 171 + ======= ============================= 172 + -ENOENT Unknown source number 173 + -EINVAL Not initialized source number 174 + ======= ============================= 198 175 199 176 * VCPU state 200 177 ··· 216 175 as it synthesizes the priorities of the pending interrupts. We 217 176 capture a bit more to report debug information. 218 177 219 - KVM_REG_PPC_VP_STATE (2 * 64bits) 220 - bits: | 63 .... 32 | 31 .... 0 | 221 - values: | TIMA word0 | TIMA word1 | 222 - bits: | 127 .......... 64 | 223 - values: | unused | 178 + KVM_REG_PPC_VP_STATE (2 * 64bits):: 179 + 180 + bits: | 63 .... 32 | 31 .... 0 | 181 + values: | TIMA word0 | TIMA word1 | 182 + bits: | 127 .......... 64 | 183 + values: | unused | 224 184 225 185 * Migration: 226 186 ··· 238 196 3. Capture the state of the source targeting, the EQs configuration 239 197 and the state of thread interrupt context registers. 240 198 241 - Restore is similar : 199 + Restore is similar: 242 200 243 201 1. Restore the EQ configuration. As targeting depends on it. 244 202 2. Restore targeting

+45 -41

Documentation/virt/kvm/halt-polling.txt Documentation/virt/kvm/halt-polling.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =========================== 1 4 The KVM halt polling system 2 5 =========================== 3 6 ··· 71 68 which come at an approximately constant rate, otherwise there will be constant 72 69 adjustment of the polling interval. 73 70 74 - [0] total block time: the time between when the halt polling function is 71 + [0] total block time: 72 + the time between when the halt polling function is 75 73 invoked and a wakeup source received (irrespective of 76 74 whether the scheduler is invoked within that function). 77 75 ··· 85 81 parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the 86 82 powerpc kvm-hv case. 87 83 88 - Module Parameter | Description | Default Value 89 - -------------------------------------------------------------------------------- 90 - halt_poll_ns | The global max polling | KVM_HALT_POLL_NS_DEFAULT 91 - | interval which defines | 92 - | the ceiling value of the | 93 - | polling interval for | (per arch value) 94 - | each vcpu. | 95 - -------------------------------------------------------------------------------- 96 - halt_poll_ns_grow | The value by which the | 2 97 - | halt polling interval is | 98 - | multiplied in the | 99 - | grow_halt_poll_ns() | 100 - | function. | 101 - -------------------------------------------------------------------------------- 102 - halt_poll_ns_grow_start | The initial value to grow | 10000 103 - | to from zero in the | 104 - | grow_halt_poll_ns() | 105 - | function. | 106 - -------------------------------------------------------------------------------- 107 - halt_poll_ns_shrink | The value by which the | 0 108 - | halt polling interval is | 109 - | divided in the | 110 - | shrink_halt_poll_ns() | 111 - | function. | 112 - -------------------------------------------------------------------------------- 84 + +-----------------------+---------------------------+-------------------------+ 85 + |Module Parameter | Description | Default Value | 86 + +-----------------------+---------------------------+-------------------------+ 87 + |halt_poll_ns | The global max polling | KVM_HALT_POLL_NS_DEFAULT| 88 + | | interval which defines | | 89 + | | the ceiling value of the | | 90 + | | polling interval for | (per arch value) | 91 + | | each vcpu. | | 92 + +-----------------------+---------------------------+-------------------------+ 93 + |halt_poll_ns_grow | The value by which the | 2 | 94 + | | halt polling interval is | | 95 + | | multiplied in the | | 96 + | | grow_halt_poll_ns() | | 97 + | | function. | | 98 + +-----------------------+---------------------------+-------------------------+ 99 + |halt_poll_ns_grow_start| The initial value to grow | 10000 | 100 + | | to from zero in the | | 101 + | | grow_halt_poll_ns() | | 102 + | | function. | | 103 + +-----------------------+---------------------------+-------------------------+ 104 + |halt_poll_ns_shrink | The value by which the | 0 | 105 + | | halt polling interval is | | 106 + | | divided in the | | 107 + | | shrink_halt_poll_ns() | | 108 + | | function. | | 109 + +-----------------------+---------------------------+-------------------------+ 113 110 114 111 These module parameters can be set from the debugfs files in: 115 112 ··· 122 117 Further Notes 123 118 ============= 124 119 125 - - Care should be taken when setting the halt_poll_ns module parameter as a 126 - large value has the potential to drive the cpu usage to 100% on a machine which 127 - would be almost entirely idle otherwise. This is because even if a guest has 128 - wakeups during which very little work is done and which are quite far apart, if 129 - the period is shorter than the global max polling interval (halt_poll_ns) then 130 - the host will always poll for the entire block time and thus cpu utilisation 131 - will go to 100%. 120 + - Care should be taken when setting the halt_poll_ns module parameter as a large value 121 + has the potential to drive the cpu usage to 100% on a machine which would be almost 122 + entirely idle otherwise. This is because even if a guest has wakeups during which very 123 + little work is done and which are quite far apart, if the period is shorter than the 124 + global max polling interval (halt_poll_ns) then the host will always poll for the 125 + entire block time and thus cpu utilisation will go to 100%. 132 126 133 - - Halt polling essentially presents a trade off between power usage and latency 134 - and the module parameters should be used to tune the affinity for this. Idle 135 - cpu time is essentially converted to host kernel time with the aim of decreasing 136 - latency when entering the guest. 127 + - Halt polling essentially presents a trade off between power usage and latency and 128 + the module parameters should be used to tune the affinity for this. Idle cpu time is 129 + essentially converted to host kernel time with the aim of decreasing latency when 130 + entering the guest. 137 131 138 - - Halt polling will only be conducted by the host when no other tasks are 139 - runnable on that cpu, otherwise the polling will cease immediately and 140 - schedule will be invoked to allow that other task to run. Thus this doesn't 141 - allow a guest to denial of service the cpu. 132 + - Halt polling will only be conducted by the host when no other tasks are runnable on 133 + that cpu, otherwise the polling will cease immediately and schedule will be invoked to 134 + allow that other task to run. Thus this doesn't allow a guest to denial of service the 135 + cpu.

+73 -56

Documentation/virt/kvm/hypercalls.txt Documentation/virt/kvm/hypercalls.rst

··· 1 - Linux KVM Hypercall: 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 2 3 =================== 4 + Linux KVM Hypercall 5 + =================== 6 + 3 7 X86: 4 8 KVM Hypercalls have a three-byte sequence of either the vmcall or the vmmcall 5 9 instruction. The hypervisor can replace it with instructions that are ··· 24 20 For further information on the S390 diagnose call as supported by KVM, 25 21 refer to Documentation/virt/kvm/s390-diag.txt. 26 22 27 - PowerPC: 23 + PowerPC: 28 24 It uses R3-R10 and hypercall number in R11. R4-R11 are used as output registers. 29 25 Return value is placed in R3. 30 26 ··· 38 34 the return value is placed in $2 (v0). 39 35 40 36 KVM Hypercalls Documentation 41 - =========================== 37 + ============================ 38 + 42 39 The template for each hypercall is: 43 40 1. Hypercall name. 44 41 2. Architecture(s) ··· 48 43 49 44 1. KVM_HC_VAPIC_POLL_IRQ 50 45 ------------------------ 51 - Architecture: x86 52 - Status: active 53 - Purpose: Trigger guest exit so that the host can check for pending 54 - interrupts on reentry. 46 + 47 + :Architecture: x86 48 + :Status: active 49 + :Purpose: Trigger guest exit so that the host can check for pending 50 + interrupts on reentry. 55 51 56 52 2. KVM_HC_MMU_OP 57 - ------------------------ 58 - Architecture: x86 59 - Status: deprecated. 60 - Purpose: Support MMU operations such as writing to PTE, 61 - flushing TLB, release PT. 53 + ---------------- 54 + 55 + :Architecture: x86 56 + :Status: deprecated. 57 + :Purpose: Support MMU operations such as writing to PTE, 58 + flushing TLB, release PT. 62 59 63 60 3. KVM_HC_FEATURES 64 - ------------------------ 65 - Architecture: PPC 66 - Status: active 67 - Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid 68 - used to enumerate which hypercalls are available. On PPC, either device tree 69 - based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration 70 - mechanism (which is this hypercall) can be used. 61 + ------------------ 62 + 63 + :Architecture: PPC 64 + :Status: active 65 + :Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid 66 + used to enumerate which hypercalls are available. On PPC, either 67 + device tree based lookup ( which is also what EPAPR dictates) 68 + OR KVM specific enumeration mechanism (which is this hypercall) 69 + can be used. 71 70 72 71 4. KVM_HC_PPC_MAP_MAGIC_PAGE 73 - ------------------------ 74 - Architecture: PPC 75 - Status: active 76 - Purpose: To enable communication between the hypervisor and guest there is a 77 - shared page that contains parts of supervisor visible register state. 78 - The guest can map this shared page to access its supervisor register through 79 - memory using this hypercall. 72 + ---------------------------- 73 + 74 + :Architecture: PPC 75 + :Status: active 76 + :Purpose: To enable communication between the hypervisor and guest there is a 77 + shared page that contains parts of supervisor visible register state. 78 + The guest can map this shared page to access its supervisor register 79 + through memory using this hypercall. 80 80 81 81 5. KVM_HC_KICK_CPU 82 - ------------------------ 83 - Architecture: x86 84 - Status: active 85 - Purpose: Hypercall used to wakeup a vcpu from HLT state 86 - Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest 87 - kernel mode for an event to occur (ex: a spinlock to become available) can 88 - execute HLT instruction once it has busy-waited for more than a threshold 89 - time-interval. Execution of HLT instruction would cause the hypervisor to put 90 - the vcpu to sleep until occurrence of an appropriate event. Another vcpu of the 91 - same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall, 92 - specifying APIC ID (a1) of the vcpu to be woken up. An additional argument (a0) 93 - is used in the hypercall for future use. 82 + ------------------ 83 + 84 + :Architecture: x86 85 + :Status: active 86 + :Purpose: Hypercall used to wakeup a vcpu from HLT state 87 + :Usage example: 88 + A vcpu of a paravirtualized guest that is busywaiting in guest 89 + kernel mode for an event to occur (ex: a spinlock to become available) can 90 + execute HLT instruction once it has busy-waited for more than a threshold 91 + time-interval. Execution of HLT instruction would cause the hypervisor to put 92 + the vcpu to sleep until occurrence of an appropriate event. Another vcpu of the 93 + same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall, 94 + specifying APIC ID (a1) of the vcpu to be woken up. An additional argument (a0) 95 + is used in the hypercall for future use. 94 96 95 97 96 98 6. KVM_HC_CLOCK_PAIRING 97 - ------------------------ 98 - Architecture: x86 99 - Status: active 100 - Purpose: Hypercall used to synchronize host and guest clocks. 99 + ----------------------- 100 + :Architecture: x86 101 + :Status: active 102 + :Purpose: Hypercall used to synchronize host and guest clocks. 103 + 101 104 Usage: 102 105 103 106 a0: guest physical address where host copies ··· 113 100 114 101 a1: clock_type, ATM only KVM_CLOCK_PAIRING_WALLCLOCK (0) 115 102 is supported (corresponding to the host's CLOCK_REALTIME clock). 103 + 104 + :: 116 105 117 106 struct kvm_clock_pairing { 118 107 __s64 sec; ··· 138 123 or if clock type is different than KVM_CLOCK_PAIRING_WALLCLOCK. 139 124 140 125 6. KVM_HC_SEND_IPI 141 - ------------------------ 142 - Architecture: x86 143 - Status: active 144 - Purpose: Send IPIs to multiple vCPUs. 126 + ------------------ 145 127 146 - a0: lower part of the bitmap of destination APIC IDs 147 - a1: higher part of the bitmap of destination APIC IDs 148 - a2: the lowest APIC ID in bitmap 149 - a3: APIC ICR 128 + :Architecture: x86 129 + :Status: active 130 + :Purpose: Send IPIs to multiple vCPUs. 131 + 132 + - a0: lower part of the bitmap of destination APIC IDs 133 + - a1: higher part of the bitmap of destination APIC IDs 134 + - a2: the lowest APIC ID in bitmap 135 + - a3: APIC ICR 150 136 151 137 The hypercall lets a guest send multicast IPIs, with at most 128 152 138 128 destinations per hypercall in 64-bit mode and 64 vCPUs per ··· 159 143 Returns the number of CPUs to which the IPIs were delivered successfully. 160 144 161 145 7. KVM_HC_SCHED_YIELD 162 - ------------------------ 163 - Architecture: x86 164 - Status: active 165 - Purpose: Hypercall used to yield if the IPI target vCPU is preempted 146 + --------------------- 147 + 148 + :Architecture: x86 149 + :Status: active 150 + :Purpose: Hypercall used to yield if the IPI target vCPU is preempted 166 151 167 152 a0: destination APIC ID 168 153 169 - Usage example: When sending a call-function IPI-many to vCPUs, yield if 170 - any of the IPI target vCPUs was preempted. 154 + :Usage example: When sending a call-function IPI-many to vCPUs, yield if 155 + any of the IPI target vCPUs was preempted.

+16

Documentation/virt/kvm/index.rst

··· 7 7 .. toctree:: 8 8 :maxdepth: 2 9 9 10 + api 10 11 amd-memory-encryption 11 12 cpuid 13 + halt-polling 14 + hypercalls 15 + locking 16 + mmu 17 + msr 18 + nested-vmx 19 + ppc-pv 20 + s390-diag 21 + timekeeping 12 22 vcpu-requests 23 + 24 + review-checklist 25 + 26 + arm/index 27 + 28 + devices/index

+243

Documentation/virt/kvm/locking.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================= 4 + KVM Lock Overview 5 + ================= 6 + 7 + 1. Acquisition Orders 8 + --------------------- 9 + 10 + The acquisition orders for mutexes are as follows: 11 + 12 + - kvm->lock is taken outside vcpu->mutex 13 + 14 + - kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock 15 + 16 + - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring 17 + them together is quite rare. 18 + 19 + On x86, vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock. 20 + 21 + Everything else is a leaf: no other lock is taken inside the critical 22 + sections. 23 + 24 + 2. Exception 25 + ------------ 26 + 27 + Fast page fault: 28 + 29 + Fast page fault is the fast path which fixes the guest page fault out of 30 + the mmu-lock on x86. Currently, the page fault can be fast in one of the 31 + following two cases: 32 + 33 + 1. Access Tracking: The SPTE is not present, but it is marked for access 34 + tracking i.e. the SPTE_SPECIAL_MASK is set. That means we need to 35 + restore the saved R/X bits. This is described in more detail later below. 36 + 37 + 2. Write-Protection: The SPTE is present and the fault is 38 + caused by write-protect. That means we just need to change the W bit of 39 + the spte. 40 + 41 + What we use to avoid all the race is the SPTE_HOST_WRITEABLE bit and 42 + SPTE_MMU_WRITEABLE bit on the spte: 43 + 44 + - SPTE_HOST_WRITEABLE means the gfn is writable on host. 45 + - SPTE_MMU_WRITEABLE means the gfn is writable on mmu. The bit is set when 46 + the gfn is writable on guest mmu and it is not write-protected by shadow 47 + page write-protection. 48 + 49 + On fast page fault path, we will use cmpxchg to atomically set the spte W 50 + bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, or 51 + restore the saved R/X bits if VMX_EPT_TRACK_ACCESS mask is set, or both. This 52 + is safe because whenever changing these bits can be detected by cmpxchg. 53 + 54 + But we need carefully check these cases: 55 + 56 + 1) The mapping from gfn to pfn 57 + 58 + The mapping from gfn to pfn may be changed since we can only ensure the pfn 59 + is not changed during cmpxchg. This is a ABA problem, for example, below case 60 + will happen: 61 + 62 + +------------------------------------------------------------------------+ 63 + | At the beginning:: | 64 + | | 65 + | gpte = gfn1 | 66 + | gfn1 is mapped to pfn1 on host | 67 + | spte is the shadow page table entry corresponding with gpte and | 68 + | spte = pfn1 | 69 + +------------------------------------------------------------------------+ 70 + | On fast page fault path: | 71 + +------------------------------------+-----------------------------------+ 72 + | CPU 0: | CPU 1: | 73 + +------------------------------------+-----------------------------------+ 74 + | :: | | 75 + | | | 76 + | old_spte = *spte; | | 77 + +------------------------------------+-----------------------------------+ 78 + | | pfn1 is swapped out:: | 79 + | | | 80 + | | spte = 0; | 81 + | | | 82 + | | pfn1 is re-alloced for gfn2. | 83 + | | | 84 + | | gpte is changed to point to | 85 + | | gfn2 by the guest:: | 86 + | | | 87 + | | spte = pfn1; | 88 + +------------------------------------+-----------------------------------+ 89 + | :: | 90 + | | 91 + | if (cmpxchg(spte, old_spte, old_spte+W) | 92 + | mark_page_dirty(vcpu->kvm, gfn1) | 93 + | OOPS!!! | 94 + +------------------------------------------------------------------------+ 95 + 96 + We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap. 97 + 98 + For direct sp, we can easily avoid it since the spte of direct sp is fixed 99 + to gfn. For indirect sp, before we do cmpxchg, we call gfn_to_pfn_atomic() 100 + to pin gfn to pfn, because after gfn_to_pfn_atomic(): 101 + 102 + - We have held the refcount of pfn that means the pfn can not be freed and 103 + be reused for another gfn. 104 + - The pfn is writable that means it can not be shared between different gfns 105 + by KSM. 106 + 107 + Then, we can ensure the dirty bitmaps is correctly set for a gfn. 108 + 109 + Currently, to simplify the whole things, we disable fast page fault for 110 + indirect shadow page. 111 + 112 + 2) Dirty bit tracking 113 + 114 + In the origin code, the spte can be fast updated (non-atomically) if the 115 + spte is read-only and the Accessed bit has already been set since the 116 + Accessed bit and Dirty bit can not be lost. 117 + 118 + But it is not true after fast page fault since the spte can be marked 119 + writable between reading spte and updating spte. Like below case: 120 + 121 + +------------------------------------------------------------------------+ 122 + | At the beginning:: | 123 + | | 124 + | spte.W = 0 | 125 + | spte.Accessed = 1 | 126 + +------------------------------------+-----------------------------------+ 127 + | CPU 0: | CPU 1: | 128 + +------------------------------------+-----------------------------------+ 129 + | In mmu_spte_clear_track_bits():: | | 130 + | | | 131 + | old_spte = *spte; | | 132 + | | | 133 + | | | 134 + | /* 'if' condition is satisfied. */| | 135 + | if (old_spte.Accessed == 1 && | | 136 + | old_spte.W == 0) | | 137 + | spte = 0ull; | | 138 + +------------------------------------+-----------------------------------+ 139 + | | on fast page fault path:: | 140 + | | | 141 + | | spte.W = 1 | 142 + | | | 143 + | | memory write on the spte:: | 144 + | | | 145 + | | spte.Dirty = 1 | 146 + +------------------------------------+-----------------------------------+ 147 + | :: | | 148 + | | | 149 + | else | | 150 + | old_spte = xchg(spte, 0ull) | | 151 + | if (old_spte.Accessed == 1) | | 152 + | kvm_set_pfn_accessed(spte.pfn);| | 153 + | if (old_spte.Dirty == 1) | | 154 + | kvm_set_pfn_dirty(spte.pfn); | | 155 + | OOPS!!! | | 156 + +------------------------------------+-----------------------------------+ 157 + 158 + The Dirty bit is lost in this case. 159 + 160 + In order to avoid this kind of issue, we always treat the spte as "volatile" 161 + if it can be updated out of mmu-lock, see spte_has_volatile_bits(), it means, 162 + the spte is always atomically updated in this case. 163 + 164 + 3) flush tlbs due to spte updated 165 + 166 + If the spte is updated from writable to readonly, we should flush all TLBs, 167 + otherwise rmap_write_protect will find a read-only spte, even though the 168 + writable spte might be cached on a CPU's TLB. 169 + 170 + As mentioned before, the spte can be updated to writable out of mmu-lock on 171 + fast page fault path, in order to easily audit the path, we see if TLBs need 172 + be flushed caused by this reason in mmu_spte_update() since this is a common 173 + function to update spte (present -> present). 174 + 175 + Since the spte is "volatile" if it can be updated out of mmu-lock, we always 176 + atomically update the spte, the race caused by fast page fault can be avoided, 177 + See the comments in spte_has_volatile_bits() and mmu_spte_update(). 178 + 179 + Lockless Access Tracking: 180 + 181 + This is used for Intel CPUs that are using EPT but do not support the EPT A/D 182 + bits. In this case, when the KVM MMU notifier is called to track accesses to a 183 + page (via kvm_mmu_notifier_clear_flush_young), it marks the PTE as not-present 184 + by clearing the RWX bits in the PTE and storing the original R & X bits in 185 + some unused/ignored bits. In addition, the SPTE_SPECIAL_MASK is also set on the 186 + PTE (using the ignored bit 62). When the VM tries to access the page later on, 187 + a fault is generated and the fast page fault mechanism described above is used 188 + to atomically restore the PTE to a Present state. The W bit is not saved when 189 + the PTE is marked for access tracking and during restoration to the Present 190 + state, the W bit is set depending on whether or not it was a write access. If 191 + it wasn't, then the W bit will remain clear until a write access happens, at 192 + which time it will be set using the Dirty tracking mechanism described above. 193 + 194 + 3. Reference 195 + ------------ 196 + 197 + :Name: kvm_lock 198 + :Type: mutex 199 + :Arch: any 200 + :Protects: - vm_list 201 + 202 + :Name: kvm_count_lock 203 + :Type: raw_spinlock_t 204 + :Arch: any 205 + :Protects: - hardware virtualization enable/disable 206 + :Comment: 'raw' because hardware enabling/disabling must be atomic /wrt 207 + migration. 208 + 209 + :Name: kvm_arch::tsc_write_lock 210 + :Type: raw_spinlock 211 + :Arch: x86 212 + :Protects: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset} 213 + - tsc offset in vmcb 214 + :Comment: 'raw' because updating the tsc offsets must not be preempted. 215 + 216 + :Name: kvm->mmu_lock 217 + :Type: spinlock_t 218 + :Arch: any 219 + :Protects: -shadow page/shadow tlb entry 220 + :Comment: it is a spinlock since it is used in mmu notifier. 221 + 222 + :Name: kvm->srcu 223 + :Type: srcu lock 224 + :Arch: any 225 + :Protects: - kvm->memslots 226 + - kvm->buses 227 + :Comment: The srcu read lock must be held while accessing memslots (e.g. 228 + when using gfn_to_* functions) and while accessing in-kernel 229 + MMIO/PIO address->device structure mapping (kvm->buses). 230 + The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu 231 + if it is needed by multiple functions. 232 + 233 + :Name: blocked_vcpu_on_cpu_lock 234 + :Type: spinlock_t 235 + :Arch: x86 236 + :Protects: blocked_vcpu_on_cpu 237 + :Comment: This is a per-CPU lock and it is used for VT-d posted-interrupts. 238 + When VT-d posted-interrupts is supported and the VM has assigned 239 + devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu 240 + protected by blocked_vcpu_on_cpu_lock, when VT-d hardware issues 241 + wakeup notification event since external interrupts from the 242 + assigned devices happens, we will find the vCPU on the list to 243 + wakeup.

-215

Documentation/virt/kvm/locking.txt

··· 1 - KVM Lock Overview 2 - ================= 3 - 4 - 1. Acquisition Orders 5 - --------------------- 6 - 7 - The acquisition orders for mutexes are as follows: 8 - 9 - - kvm->lock is taken outside vcpu->mutex 10 - 11 - - kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock 12 - 13 - - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring 14 - them together is quite rare. 15 - 16 - On x86, vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock. 17 - 18 - Everything else is a leaf: no other lock is taken inside the critical 19 - sections. 20 - 21 - 2: Exception 22 - ------------ 23 - 24 - Fast page fault: 25 - 26 - Fast page fault is the fast path which fixes the guest page fault out of 27 - the mmu-lock on x86. Currently, the page fault can be fast in one of the 28 - following two cases: 29 - 30 - 1. Access Tracking: The SPTE is not present, but it is marked for access 31 - tracking i.e. the SPTE_SPECIAL_MASK is set. That means we need to 32 - restore the saved R/X bits. This is described in more detail later below. 33 - 34 - 2. Write-Protection: The SPTE is present and the fault is 35 - caused by write-protect. That means we just need to change the W bit of the 36 - spte. 37 - 38 - What we use to avoid all the race is the SPTE_HOST_WRITEABLE bit and 39 - SPTE_MMU_WRITEABLE bit on the spte: 40 - - SPTE_HOST_WRITEABLE means the gfn is writable on host. 41 - - SPTE_MMU_WRITEABLE means the gfn is writable on mmu. The bit is set when 42 - the gfn is writable on guest mmu and it is not write-protected by shadow 43 - page write-protection. 44 - 45 - On fast page fault path, we will use cmpxchg to atomically set the spte W 46 - bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, or 47 - restore the saved R/X bits if VMX_EPT_TRACK_ACCESS mask is set, or both. This 48 - is safe because whenever changing these bits can be detected by cmpxchg. 49 - 50 - But we need carefully check these cases: 51 - 1): The mapping from gfn to pfn 52 - The mapping from gfn to pfn may be changed since we can only ensure the pfn 53 - is not changed during cmpxchg. This is a ABA problem, for example, below case 54 - will happen: 55 - 56 - At the beginning: 57 - gpte = gfn1 58 - gfn1 is mapped to pfn1 on host 59 - spte is the shadow page table entry corresponding with gpte and 60 - spte = pfn1 61 - 62 - VCPU 0 VCPU0 63 - on fast page fault path: 64 - 65 - old_spte = *spte; 66 - pfn1 is swapped out: 67 - spte = 0; 68 - 69 - pfn1 is re-alloced for gfn2. 70 - 71 - gpte is changed to point to 72 - gfn2 by the guest: 73 - spte = pfn1; 74 - 75 - if (cmpxchg(spte, old_spte, old_spte+W) 76 - mark_page_dirty(vcpu->kvm, gfn1) 77 - OOPS!!! 78 - 79 - We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap. 80 - 81 - For direct sp, we can easily avoid it since the spte of direct sp is fixed 82 - to gfn. For indirect sp, before we do cmpxchg, we call gfn_to_pfn_atomic() 83 - to pin gfn to pfn, because after gfn_to_pfn_atomic(): 84 - - We have held the refcount of pfn that means the pfn can not be freed and 85 - be reused for another gfn. 86 - - The pfn is writable that means it can not be shared between different gfns 87 - by KSM. 88 - 89 - Then, we can ensure the dirty bitmaps is correctly set for a gfn. 90 - 91 - Currently, to simplify the whole things, we disable fast page fault for 92 - indirect shadow page. 93 - 94 - 2): Dirty bit tracking 95 - In the origin code, the spte can be fast updated (non-atomically) if the 96 - spte is read-only and the Accessed bit has already been set since the 97 - Accessed bit and Dirty bit can not be lost. 98 - 99 - But it is not true after fast page fault since the spte can be marked 100 - writable between reading spte and updating spte. Like below case: 101 - 102 - At the beginning: 103 - spte.W = 0 104 - spte.Accessed = 1 105 - 106 - VCPU 0 VCPU0 107 - In mmu_spte_clear_track_bits(): 108 - 109 - old_spte = *spte; 110 - 111 - /* 'if' condition is satisfied. */ 112 - if (old_spte.Accessed == 1 && 113 - old_spte.W == 0) 114 - spte = 0ull; 115 - on fast page fault path: 116 - spte.W = 1 117 - memory write on the spte: 118 - spte.Dirty = 1 119 - 120 - 121 - else 122 - old_spte = xchg(spte, 0ull) 123 - 124 - 125 - if (old_spte.Accessed == 1) 126 - kvm_set_pfn_accessed(spte.pfn); 127 - if (old_spte.Dirty == 1) 128 - kvm_set_pfn_dirty(spte.pfn); 129 - OOPS!!! 130 - 131 - The Dirty bit is lost in this case. 132 - 133 - In order to avoid this kind of issue, we always treat the spte as "volatile" 134 - if it can be updated out of mmu-lock, see spte_has_volatile_bits(), it means, 135 - the spte is always atomically updated in this case. 136 - 137 - 3): flush tlbs due to spte updated 138 - If the spte is updated from writable to readonly, we should flush all TLBs, 139 - otherwise rmap_write_protect will find a read-only spte, even though the 140 - writable spte might be cached on a CPU's TLB. 141 - 142 - As mentioned before, the spte can be updated to writable out of mmu-lock on 143 - fast page fault path, in order to easily audit the path, we see if TLBs need 144 - be flushed caused by this reason in mmu_spte_update() since this is a common 145 - function to update spte (present -> present). 146 - 147 - Since the spte is "volatile" if it can be updated out of mmu-lock, we always 148 - atomically update the spte, the race caused by fast page fault can be avoided, 149 - See the comments in spte_has_volatile_bits() and mmu_spte_update(). 150 - 151 - Lockless Access Tracking: 152 - 153 - This is used for Intel CPUs that are using EPT but do not support the EPT A/D 154 - bits. In this case, when the KVM MMU notifier is called to track accesses to a 155 - page (via kvm_mmu_notifier_clear_flush_young), it marks the PTE as not-present 156 - by clearing the RWX bits in the PTE and storing the original R & X bits in 157 - some unused/ignored bits. In addition, the SPTE_SPECIAL_MASK is also set on the 158 - PTE (using the ignored bit 62). When the VM tries to access the page later on, 159 - a fault is generated and the fast page fault mechanism described above is used 160 - to atomically restore the PTE to a Present state. The W bit is not saved when 161 - the PTE is marked for access tracking and during restoration to the Present 162 - state, the W bit is set depending on whether or not it was a write access. If 163 - it wasn't, then the W bit will remain clear until a write access happens, at 164 - which time it will be set using the Dirty tracking mechanism described above. 165 - 166 - 3. Reference 167 - ------------ 168 - 169 - Name: kvm_lock 170 - Type: mutex 171 - Arch: any 172 - Protects: - vm_list 173 - 174 - Name: kvm_count_lock 175 - Type: raw_spinlock_t 176 - Arch: any 177 - Protects: - hardware virtualization enable/disable 178 - Comment: 'raw' because hardware enabling/disabling must be atomic /wrt 179 - migration. 180 - 181 - Name: kvm_arch::tsc_write_lock 182 - Type: raw_spinlock 183 - Arch: x86 184 - Protects: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset} 185 - - tsc offset in vmcb 186 - Comment: 'raw' because updating the tsc offsets must not be preempted. 187 - 188 - Name: kvm->mmu_lock 189 - Type: spinlock_t 190 - Arch: any 191 - Protects: -shadow page/shadow tlb entry 192 - Comment: it is a spinlock since it is used in mmu notifier. 193 - 194 - Name: kvm->srcu 195 - Type: srcu lock 196 - Arch: any 197 - Protects: - kvm->memslots 198 - - kvm->buses 199 - Comment: The srcu read lock must be held while accessing memslots (e.g. 200 - when using gfn_to_* functions) and while accessing in-kernel 201 - MMIO/PIO address->device structure mapping (kvm->buses). 202 - The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu 203 - if it is needed by multiple functions. 204 - 205 - Name: blocked_vcpu_on_cpu_lock 206 - Type: spinlock_t 207 - Arch: x86 208 - Protects: blocked_vcpu_on_cpu 209 - Comment: This is a per-CPU lock and it is used for VT-d posted-interrupts. 210 - When VT-d posted-interrupts is supported and the VM has assigned 211 - devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu 212 - protected by blocked_vcpu_on_cpu_lock, when VT-d hardware issues 213 - wakeup notification event since external interrupts from the 214 - assigned devices happens, we will find the vCPU on the list to 215 - wakeup.

+48 -14

Documentation/virt/kvm/mmu.txt Documentation/virt/kvm/mmu.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ====================== 1 4 The x86 kvm shadow mmu 2 5 ====================== 3 6 ··· 10 7 11 8 The mmu code attempts to satisfy the following requirements: 12 9 13 - - correctness: the guest should not be able to determine that it is running 10 + - correctness: 11 + the guest should not be able to determine that it is running 14 12 on an emulated mmu except for timing (we attempt to comply 15 13 with the specification, not emulate the characteristics of 16 14 a particular implementation such as tlb size) 17 - - security: the guest must not be able to touch host memory not assigned 15 + - security: 16 + the guest must not be able to touch host memory not assigned 18 17 to it 19 - - performance: minimize the performance penalty imposed by the mmu 20 - - scaling: need to scale to large memory and large vcpu guests 21 - - hardware: support the full range of x86 virtualization hardware 22 - - integration: Linux memory management code must be in control of guest memory 18 + - performance: 19 + minimize the performance penalty imposed by the mmu 20 + - scaling: 21 + need to scale to large memory and large vcpu guests 22 + - hardware: 23 + support the full range of x86 virtualization hardware 24 + - integration: 25 + Linux memory management code must be in control of guest memory 23 26 so that swapping, page migration, page merging, transparent 24 27 hugepages, and similar features work without change 25 - - dirty tracking: report writes to guest memory to enable live migration 28 + - dirty tracking: 29 + report writes to guest memory to enable live migration 26 30 and framebuffer-based displays 27 - - footprint: keep the amount of pinned kernel memory low (most memory 31 + - footprint: 32 + keep the amount of pinned kernel memory low (most memory 28 33 should be shrinkable) 29 - - reliability: avoid multipage or GFP_ATOMIC allocations 34 + - reliability: 35 + avoid multipage or GFP_ATOMIC allocations 30 36 31 37 Acronyms 32 38 ======== 33 39 40 + ==== ==================================================================== 34 41 pfn host page frame number 35 42 hpa host physical address 36 43 hva host virtual address ··· 54 41 gpte guest pte (referring to gfns) 55 42 spte shadow pte (referring to pfns) 56 43 tdp two dimensional paging (vendor neutral term for NPT and EPT) 44 + ==== ==================================================================== 57 45 58 46 Virtual and real hardware supported 59 47 =================================== ··· 104 90 The mmu is driven by events, some from the guest, some from the host. 105 91 106 92 Guest generated events: 93 + 107 94 - writes to control registers (especially cr3) 108 95 - invlpg/invlpga instruction execution 109 96 - access to missing or protected translations 110 97 111 98 Host generated events: 99 + 112 100 - changes in the gpa->hpa translation (either through gpa->hva changes or 113 101 through hva->hpa changes) 114 102 - memory pressure (the shrinker) ··· 133 117 The following table shows translations encoded by leaf ptes, with higher-level 134 118 translations in parentheses: 135 119 136 - Non-nested guests: 120 + Non-nested guests:: 121 + 137 122 nonpaging: gpa->hpa 138 123 paging: gva->gpa->hpa 139 124 paging, tdp: (gva->)gpa->hpa 140 - Nested guests: 125 + 126 + Nested guests:: 127 + 141 128 non-tdp: ngva->gpa->hpa (*) 142 129 tdp: (ngva->)ngpa->gpa->hpa 143 130 144 - (*) the guest hypervisor will encode the ngva->gpa translation into its page 145 - tables if npt is not present 131 + (*) the guest hypervisor will encode the ngva->gpa translation into its page 132 + tables if npt is not present 146 133 147 134 Shadow pages contain the following information: 148 135 role.level: ··· 310 291 311 292 - if the RSV bit of the error code is set, the page fault is caused by guest 312 293 accessing MMIO and cached MMIO information is available. 294 + 313 295 - walk shadow page table 314 296 - check for valid generation number in the spte (see "Fast invalidation of 315 297 MMIO sptes" below) 316 298 - cache the information to vcpu->arch.mmio_gva, vcpu->arch.mmio_access and 317 299 vcpu->arch.mmio_gfn, and call the emulator 300 + 318 301 - If both P bit and R/W bit of error code are set, this could possibly 319 302 be handled as a "fast page fault" (fixed without taking the MMU lock). See 320 303 the description in Documentation/virt/kvm/locking.txt. 304 + 321 305 - if needed, walk the guest page tables to determine the guest translation 322 306 (gva->gpa or ngpa->gpa) 307 + 323 308 - if permissions are insufficient, reflect the fault back to the guest 309 + 324 310 - determine the host page 311 + 325 312 - if this is an mmio request, there is no host page; cache the info to 326 313 vcpu->arch.mmio_gva, vcpu->arch.mmio_access and vcpu->arch.mmio_gfn 314 + 327 315 - walk the shadow page table to find the spte for the translation, 328 316 instantiating missing intermediate page tables as necessary 317 + 329 318 - If this is an mmio request, cache the mmio info to the spte and set some 330 319 reserved bit on the spte (see callers of kvm_mmu_set_mmio_spte_mask) 320 + 331 321 - try to unsynchronize the page 322 + 332 323 - if successful, we can let the guest continue and modify the gpte 324 + 333 325 - emulate the instruction 326 + 334 327 - if failed, unshadow the page and let the guest continue 328 + 335 329 - update any translations that were modified by the instruction 336 330 337 331 invlpg handling: ··· 356 324 Guest control register updates: 357 325 358 326 - mov to cr3 327 + 359 328 - look up new shadow roots 360 329 - synchronize newly reachable shadow pages 361 330 362 331 - mov to cr0/cr4/efer 332 + 363 333 - set up mmu context for new paging mode 364 334 - look up new shadow roots 365 335 - synchronize newly reachable shadow pages ··· 392 358 (user write faults generate a #PF) 393 359 394 360 In the first case there are two additional complications: 361 + 395 362 - if CR4.SMEP is enabled: since we've turned the page into a kernel page, 396 363 the kernel may now execute it. We handle this by also setting spte.nx. 397 364 If we get a user fetch or read fault, we'll change spte.u=1 and ··· 481 446 482 447 - NPT presentation from KVM Forum 2008 483 448 http://www.linux-kvm.org/images/c/c8/KvmForum2008%24kdf2008_21.pdf 484 -

+100 -63

Documentation/virt/kvm/msr.txt Documentation/virt/kvm/msr.rst

··· 1 - KVM-specific MSRs. 2 - Glauber Costa <glommer@redhat.com>, Red Hat Inc, 2010 3 - ===================================================== 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================= 4 + KVM-specific MSRs 5 + ================= 6 + 7 + :Author: Glauber Costa <glommer@redhat.com>, Red Hat Inc, 2010 4 8 5 9 KVM makes use of some custom MSRs to service some requests. 6 10 ··· 13 9 but they are deprecated and their use is discouraged. 14 10 15 11 Custom MSR list 16 - -------- 12 + --------------- 17 13 18 14 The current supported Custom MSR list is: 19 15 20 - MSR_KVM_WALL_CLOCK_NEW: 0x4b564d00 16 + MSR_KVM_WALL_CLOCK_NEW: 17 + 0x4b564d00 21 18 22 - data: 4-byte alignment physical address of a memory area which must be 19 + data: 20 + 4-byte alignment physical address of a memory area which must be 23 21 in guest RAM. This memory is expected to hold a copy of the following 24 - structure: 22 + structure:: 25 23 26 - struct pvclock_wall_clock { 24 + struct pvclock_wall_clock { 27 25 u32 version; 28 26 u32 sec; 29 27 u32 nsec; 30 - } __attribute__((__packed__)); 28 + } __attribute__((__packed__)); 31 29 32 30 whose data will be filled in by the hypervisor. The hypervisor is only 33 31 guaranteed to update this data at the moment of MSR write. 34 32 Users that want to reliably query this information more than once have 35 33 to write more than once to this MSR. Fields have the following meanings: 36 34 37 - version: guest has to check version before and after grabbing 35 + version: 36 + guest has to check version before and after grabbing 38 37 time information and check that they are both equal and even. 39 38 An odd version indicates an in-progress update. 40 39 41 - sec: number of seconds for wallclock at time of boot. 40 + sec: 41 + number of seconds for wallclock at time of boot. 42 42 43 - nsec: number of nanoseconds for wallclock at time of boot. 43 + nsec: 44 + number of nanoseconds for wallclock at time of boot. 44 45 45 46 In order to get the current wallclock time, the system_time from 46 47 MSR_KVM_SYSTEM_TIME_NEW needs to be added. ··· 56 47 Availability of this MSR must be checked via bit 3 in 0x4000001 cpuid 57 48 leaf prior to usage. 58 49 59 - MSR_KVM_SYSTEM_TIME_NEW: 0x4b564d01 50 + MSR_KVM_SYSTEM_TIME_NEW: 51 + 0x4b564d01 60 52 61 - data: 4-byte aligned physical address of a memory area which must be in 53 + data: 54 + 4-byte aligned physical address of a memory area which must be in 62 55 guest RAM, plus an enable bit in bit 0. This memory is expected to hold 63 - a copy of the following structure: 56 + a copy of the following structure:: 64 57 65 - struct pvclock_vcpu_time_info { 58 + struct pvclock_vcpu_time_info { 66 59 u32 version; 67 60 u32 pad0; 68 61 u64 tsc_timestamp; ··· 73 62 s8 tsc_shift; 74 63 u8 flags; 75 64 u8 pad[2]; 76 - } __attribute__((__packed__)); /* 32 bytes */ 65 + } __attribute__((__packed__)); /* 32 bytes */ 77 66 78 67 whose data will be filled in by the hypervisor periodically. Only one 79 68 write, or registration, is needed for each VCPU. The interval between ··· 83 72 84 73 Fields have the following meanings: 85 74 86 - version: guest has to check version before and after grabbing 75 + version: 76 + guest has to check version before and after grabbing 87 77 time information and check that they are both equal and even. 88 78 An odd version indicates an in-progress update. 89 79 90 - tsc_timestamp: the tsc value at the current VCPU at the time 80 + tsc_timestamp: 81 + the tsc value at the current VCPU at the time 91 82 of the update of this structure. Guests can subtract this value 92 83 from current tsc to derive a notion of elapsed time since the 93 84 structure update. 94 85 95 - system_time: a host notion of monotonic time, including sleep 86 + system_time: 87 + a host notion of monotonic time, including sleep 96 88 time at the time this structure was last updated. Unit is 97 89 nanoseconds. 98 90 99 - tsc_to_system_mul: multiplier to be used when converting 91 + tsc_to_system_mul: 92 + multiplier to be used when converting 100 93 tsc-related quantity to nanoseconds 101 94 102 - tsc_shift: shift to be used when converting tsc-related 95 + tsc_shift: 96 + shift to be used when converting tsc-related 103 97 quantity to nanoseconds. This shift will ensure that 104 98 multiplication with tsc_to_system_mul does not overflow. 105 99 A positive value denotes a left shift, a negative value ··· 112 96 113 97 The conversion from tsc to nanoseconds involves an additional 114 98 right shift by 32 bits. With this information, guests can 115 - derive per-CPU time by doing: 99 + derive per-CPU time by doing:: 116 100 117 101 time = (current_tsc - tsc_timestamp) 118 102 if (tsc_shift >= 0) ··· 122 106 time = (time * tsc_to_system_mul) >> 32 123 107 time = time + system_time 124 108 125 - flags: bits in this field indicate extended capabilities 109 + flags: 110 + bits in this field indicate extended capabilities 126 111 coordinated between the guest and the hypervisor. Availability 127 112 of specific flags has to be checked in 0x40000001 cpuid leaf. 128 113 Current flags are: 129 114 130 - flag bit | cpuid bit | meaning 131 - ------------------------------------------------------------- 132 - | | time measures taken across 133 - 0 | 24 | multiple cpus are guaranteed to 134 - | | be monotonic 135 - ------------------------------------------------------------- 136 - | | guest vcpu has been paused by 137 - 1 | N/A | the host 138 - | | See 4.70 in api.txt 139 - ------------------------------------------------------------- 115 + 116 + +-----------+--------------+----------------------------------+ 117 + | flag bit | cpuid bit | meaning | 118 + +-----------+--------------+----------------------------------+ 119 + | | | time measures taken across | 120 + | 0 | 24 | multiple cpus are guaranteed to | 121 + | | | be monotonic | 122 + +-----------+--------------+----------------------------------+ 123 + | | | guest vcpu has been paused by | 124 + | 1 | N/A | the host | 125 + | | | See 4.70 in api.txt | 126 + +-----------+--------------+----------------------------------+ 140 127 141 128 Availability of this MSR must be checked via bit 3 in 0x4000001 cpuid 142 129 leaf prior to usage. 143 130 144 131 145 - MSR_KVM_WALL_CLOCK: 0x11 132 + MSR_KVM_WALL_CLOCK: 133 + 0x11 146 134 147 - data and functioning: same as MSR_KVM_WALL_CLOCK_NEW. Use that instead. 148 - 149 - This MSR falls outside the reserved KVM range and may be removed in the 150 - future. Its usage is deprecated. 151 - 152 - Availability of this MSR must be checked via bit 0 in 0x4000001 cpuid 153 - leaf prior to usage. 154 - 155 - MSR_KVM_SYSTEM_TIME: 0x12 156 - 157 - data and functioning: same as MSR_KVM_SYSTEM_TIME_NEW. Use that instead. 135 + data and functioning: 136 + same as MSR_KVM_WALL_CLOCK_NEW. Use that instead. 158 137 159 138 This MSR falls outside the reserved KVM range and may be removed in the 160 139 future. Its usage is deprecated. ··· 157 146 Availability of this MSR must be checked via bit 0 in 0x4000001 cpuid 158 147 leaf prior to usage. 159 148 160 - The suggested algorithm for detecting kvmclock presence is then: 149 + MSR_KVM_SYSTEM_TIME: 150 + 0x12 151 + 152 + data and functioning: 153 + same as MSR_KVM_SYSTEM_TIME_NEW. Use that instead. 154 + 155 + This MSR falls outside the reserved KVM range and may be removed in the 156 + future. Its usage is deprecated. 157 + 158 + Availability of this MSR must be checked via bit 0 in 0x4000001 cpuid 159 + leaf prior to usage. 160 + 161 + The suggested algorithm for detecting kvmclock presence is then:: 161 162 162 163 if (!kvm_para_available()) /* refer to cpuid.txt */ 163 164 return NON_PRESENT; ··· 186 163 } else 187 164 return NON_PRESENT; 188 165 189 - MSR_KVM_ASYNC_PF_EN: 0x4b564d02 190 - data: Bits 63-6 hold 64-byte aligned physical address of a 166 + MSR_KVM_ASYNC_PF_EN: 167 + 0x4b564d02 168 + 169 + data: 170 + Bits 63-6 hold 64-byte aligned physical address of a 191 171 64 byte memory area which must be in guest RAM and must be 192 172 zeroed. Bits 5-3 are reserved and should be zero. Bit 0 is 1 193 173 when asynchronous page faults are enabled on the vcpu 0 when ··· 226 200 Currently type 2 APF will be always delivered on the same vcpu as 227 201 type 1 was, but guest should not rely on that. 228 202 229 - MSR_KVM_STEAL_TIME: 0x4b564d03 203 + MSR_KVM_STEAL_TIME: 204 + 0x4b564d03 230 205 231 - data: 64-byte alignment physical address of a memory area which must be 206 + data: 207 + 64-byte alignment physical address of a memory area which must be 232 208 in guest RAM, plus an enable bit in bit 0. This memory is expected to 233 - hold a copy of the following structure: 209 + hold a copy of the following structure:: 234 210 235 - struct kvm_steal_time { 211 + struct kvm_steal_time { 236 212 __u64 steal; 237 213 __u32 version; 238 214 __u32 flags; 239 215 __u8 preempted; 240 216 __u8 u8_pad[3]; 241 217 __u32 pad[11]; 242 - } 218 + } 243 219 244 220 whose data will be filled in by the hypervisor periodically. Only one 245 221 write, or registration, is needed for each VCPU. The interval between ··· 252 224 253 225 Fields have the following meanings: 254 226 255 - version: a sequence counter. In other words, guest has to check 227 + version: 228 + a sequence counter. In other words, guest has to check 256 229 this field before and after grabbing time information and make 257 230 sure they are both equal and even. An odd version indicates an 258 231 in-progress update. 259 232 260 - flags: At this point, always zero. May be used to indicate 233 + flags: 234 + At this point, always zero. May be used to indicate 261 235 changes in this structure in the future. 262 236 263 - steal: the amount of time in which this vCPU did not run, in 237 + steal: 238 + the amount of time in which this vCPU did not run, in 264 239 nanoseconds. Time during which the vcpu is idle, will not be 265 240 reported as steal time. 266 241 267 - preempted: indicate the vCPU who owns this struct is running or 242 + preempted: 243 + indicate the vCPU who owns this struct is running or 268 244 not. Non-zero values mean the vCPU has been preempted. Zero 269 245 means the vCPU is not preempted. NOTE, it is always zero if the 270 246 the hypervisor doesn't support this field. 271 247 272 - MSR_KVM_EOI_EN: 0x4b564d04 273 - data: Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0 248 + MSR_KVM_EOI_EN: 249 + 0x4b564d04 250 + 251 + data: 252 + Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0 274 253 when disabled. Bit 1 is reserved and must be zero. When PV end of 275 254 interrupt is enabled (bit 0 set), bits 63-2 hold a 4-byte aligned 276 255 physical address of a 4 byte memory area which must be in guest RAM and ··· 309 274 clear it using a single CPU instruction, such as test and clear, or 310 275 compare and exchange. 311 276 312 - MSR_KVM_POLL_CONTROL: 0x4b564d05 277 + MSR_KVM_POLL_CONTROL: 278 + 0x4b564d05 279 + 313 280 Control host-side polling. 314 281 315 - data: Bit 0 enables (1) or disables (0) host-side HLT polling logic. 282 + data: 283 + Bit 0 enables (1) or disables (0) host-side HLT polling logic. 316 284 317 285 KVM guests can request the host not to poll on HLT, for example if 318 286 they are performing polling themselves. 319 -

+21 -16

Documentation/virt/kvm/nested-vmx.txt Documentation/virt/kvm/nested-vmx.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ========== 1 4 Nested VMX 2 5 ========== 3 6 ··· 44 41 emulated CPU type (qemu64) does not list the "VMX" CPU feature, so it must be 45 42 explicitly enabled, by giving qemu one of the following options: 46 43 47 - -cpu host (emulated CPU has all features of the real CPU) 44 + - cpu host (emulated CPU has all features of the real CPU) 48 45 49 - -cpu qemu64,+vmx (add just the vmx feature to a named CPU type) 46 + - cpu qemu64,+vmx (add just the vmx feature to a named CPU type) 50 47 51 48 52 49 ABIs ··· 77 74 of this structure changes, this can break live migration across KVM versions. 78 75 VMCS12_REVISION (from vmx.c) should be changed if struct vmcs12 or its inner 79 76 struct shadow_vmcs is ever changed. 77 + 78 + :: 80 79 81 80 typedef u64 natural_width; 82 81 struct __packed vmcs12 { ··· 225 220 ------- 226 221 227 222 These patches were written by: 228 - Abel Gordon, abelg <at> il.ibm.com 229 - Nadav Har'El, nyh <at> il.ibm.com 230 - Orit Wasserman, oritw <at> il.ibm.com 231 - Ben-Ami Yassor, benami <at> il.ibm.com 232 - Muli Ben-Yehuda, muli <at> il.ibm.com 223 + - Abel Gordon, abelg <at> il.ibm.com 224 + - Nadav Har'El, nyh <at> il.ibm.com 225 + - Orit Wasserman, oritw <at> il.ibm.com 226 + - Ben-Ami Yassor, benami <at> il.ibm.com 227 + - Muli Ben-Yehuda, muli <at> il.ibm.com 233 228 234 229 With contributions by: 235 - Anthony Liguori, aliguori <at> us.ibm.com 236 - Mike Day, mdday <at> us.ibm.com 237 - Michael Factor, factor <at> il.ibm.com 238 - Zvi Dubitzky, dubi <at> il.ibm.com 230 + - Anthony Liguori, aliguori <at> us.ibm.com 231 + - Mike Day, mdday <at> us.ibm.com 232 + - Michael Factor, factor <at> il.ibm.com 233 + - Zvi Dubitzky, dubi <at> il.ibm.com 239 234 240 235 And valuable reviews by: 241 - Avi Kivity, avi <at> redhat.com 242 - Gleb Natapov, gleb <at> redhat.com 243 - Marcelo Tosatti, mtosatti <at> redhat.com 244 - Kevin Tian, kevin.tian <at> intel.com 245 - and others. 236 + - Avi Kivity, avi <at> redhat.com 237 + - Gleb Natapov, gleb <at> redhat.com 238 + - Marcelo Tosatti, mtosatti <at> redhat.com 239 + - Kevin Tian, kevin.tian <at> intel.com 240 + - and others.

+18 -8

Documentation/virt/kvm/ppc-pv.txt Documentation/virt/kvm/ppc-pv.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================================= 1 4 The PPC KVM paravirtual interface 2 5 ================================= 3 6 ··· 37 34 38 35 The parameters are as follows: 39 36 37 + ======== ================ ================ 40 38 Register IN OUT 41 - 39 + ======== ================ ================ 42 40 r0 - volatile 43 41 r3 1st parameter Return code 44 42 r4 2nd parameter 1st output value ··· 51 47 r10 8th parameter 7th output value 52 48 r11 hypercall number 8th output value 53 49 r12 - volatile 50 + ======== ================ ================ 54 51 55 52 Hypercall definitions are shared in generic code, so the same hypercall numbers 56 53 apply for x86 and powerpc alike with the exception that each KVM hypercall ··· 59 54 60 55 Return codes can be as follows: 61 56 57 + ==== ========================= 62 58 Code Meaning 63 - 59 + ==== ========================= 64 60 0 Success 65 61 12 Hypercall not implemented 66 62 <0 Error 63 + ==== ========================= 67 64 68 65 The magic page 69 66 ============== ··· 79 72 MMU is enabled. The second parameter indicates the address in real mode, if 80 73 applicable to the target. For now, we always map the page to -4096. This way we 81 74 can access it using absolute load and store functions. The following 82 - instruction reads the first field of the magic page: 75 + instruction reads the first field of the magic page:: 83 76 84 77 ld rX, -4096(0) 85 78 ··· 100 93 101 94 The following enhancements to the magic page are currently available: 102 95 96 + ============================ ======================================= 103 97 KVM_MAGIC_FEAT_SR Maps SR registers r/w in the magic page 104 98 KVM_MAGIC_FEAT_MAS0_TO_SPRG7 Maps MASn, ESR, PIR and high SPRGs 99 + ============================ ======================================= 105 100 106 101 For enhanced features in the magic page, please check for the existence of the 107 102 feature before using them! ··· 130 121 131 122 The following bits are safe to be set inside the guest: 132 123 133 - MSR_EE 134 - MSR_RI 124 + - MSR_EE 125 + - MSR_RI 135 126 136 127 If any other bit changes in the MSR, please still use mtmsr(d). 137 128 ··· 147 138 also act on the shared page. So calling privileged instructions still works as 148 139 before. 149 140 141 + ======================= ================================ 150 142 From To 151 - ==== == 152 - 143 + ======================= ================================ 153 144 mfmsr rX ld rX, magic_page->msr 154 145 mfsprg rX, 0 ld rX, magic_page->sprg0 155 146 mfsprg rX, 1 ld rX, magic_page->sprg1 ··· 182 173 183 174 [BookE only] 184 175 wrteei [0|1] b <special wrteei section> 185 - 176 + ======================= ================================ 186 177 187 178 Some instructions require more logic to determine what's going on than a load 188 179 or store instruction can deliver. To enable patching of those, we keep some ··· 200 191 201 192 Hypercall ABIs in KVM on PowerPC 202 193 ================================= 194 + 203 195 1) KVM hypercalls (ePAPR) 204 196 205 197 These are ePAPR compliant hypercall implementation (mentioned above). Even

+3

Documentation/virt/kvm/review-checklist.txt Documentation/virt/kvm/review-checklist.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================================ 1 4 Review checklist for kvm patches 2 5 ================================ 3 6

+8 -5

Documentation/virt/kvm/s390-diag.txt Documentation/virt/kvm/s390-diag.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============================= 1 4 The s390 DIAGNOSE call on KVM 2 5 ============================= 3 6 ··· 19 16 all supported DIAGNOSE calls need to be handled by either KVM or its 20 17 userspace. 21 18 22 - All DIAGNOSE calls supported by KVM use the RS-a format: 19 + All DIAGNOSE calls supported by KVM use the RS-a format:: 23 20 24 - -------------------------------------- 25 - | '83' | R1 | R3 | B2 | D2 | 26 - -------------------------------------- 27 - 0 8 12 16 20 31 21 + -------------------------------------- 22 + | '83' | R1 | R3 | B2 | D2 | 23 + -------------------------------------- 24 + 0 8 12 16 20 31 28 25 29 26 The second-operand address (obtained by the base/displacement calculation) 30 27 is not used to address data. Instead, bits 48-63 of this address specify

+127 -94

Documentation/virt/kvm/timekeeping.txt Documentation/virt/kvm/timekeeping.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 1 2 2 - Timekeeping Virtualization for X86-Based Architectures 3 + ====================================================== 4 + Timekeeping Virtualization for X86-Based Architectures 5 + ====================================================== 3 6 4 - Zachary Amsden <zamsden@redhat.com> 5 - Copyright (c) 2010, Red Hat. All rights reserved. 7 + :Author: Zachary Amsden <zamsden@redhat.com> 8 + :Copyright: (c) 2010, Red Hat. All rights reserved. 6 9 7 - 1) Overview 8 - 2) Timing Devices 9 - 3) TSC Hardware 10 - 4) Virtualization Problems 10 + .. Contents 11 11 12 - ========================================================================= 12 + 1) Overview 13 + 2) Timing Devices 14 + 3) TSC Hardware 15 + 4) Virtualization Problems 13 16 14 - 1) Overview 17 + 1. Overview 18 + =========== 15 19 16 20 One of the most complicated parts of the X86 platform, and specifically, 17 21 the virtualization of this platform is the plethora of timing devices available ··· 31 27 timekeeping which may be difficult to find elsewhere, specifically, 32 28 information relevant to KVM and hardware-based virtualization. 33 29 34 - ========================================================================= 35 - 36 - 2) Timing Devices 30 + 2. Timing Devices 31 + ================= 37 32 38 33 First we discuss the basic hardware devices available. TSC and the related 39 34 KVM clock are special enough to warrant a full exposition and are described in 40 35 the following section. 41 36 42 - 2.1) i8254 - PIT 37 + 2.1. i8254 - PIT 38 + ---------------- 43 39 44 40 One of the first timer devices available is the programmable interrupt timer, 45 41 or PIT. The PIT has a fixed frequency 1.193182 MHz base clock and three ··· 54 50 using single or multiple byte access to the I/O ports. There are 6 modes 55 51 available, but not all modes are available to all timers, as only timer 2 56 52 has a connected gate input, required for modes 1 and 5. The gate line is 57 - controlled by port 61h, bit 0, as illustrated in the following diagram. 53 + controlled by port 61h, bit 0, as illustrated in the following diagram:: 58 54 59 - -------------- ---------------- 60 - | | | | 61 - | 1.1932 MHz |---------->| CLOCK OUT | ---------> IRQ 0 62 - | Clock | | | | 63 - -------------- | +->| GATE TIMER 0 | 55 + -------------- ---------------- 56 + | | | | 57 + | 1.1932 MHz|---------->| CLOCK OUT | ---------> IRQ 0 58 + | Clock | | | | 59 + -------------- | +->| GATE TIMER 0 | 64 60 | ---------------- 65 61 | 66 62 | ---------------- ··· 74 70 | | | 75 71 |------>| CLOCK OUT | ---------> Port 61h, bit 5 76 72 | | | 77 - Port 61h, bit 0 ---------->| GATE TIMER 2 | \_.---- ____ 73 + Port 61h, bit 0 -------->| GATE TIMER 2 | \_.---- ____ 78 74 ---------------- _| )--|LPF|---Speaker 79 75 / *---- \___/ 80 - Port 61h, bit 1 -----------------------------------/ 76 + Port 61h, bit 1 ---------------------------------/ 81 77 82 78 The timer modes are now described. 83 79 84 - Mode 0: Single Timeout. This is a one-shot software timeout that counts down 80 + Mode 0: Single Timeout. 81 + This is a one-shot software timeout that counts down 85 82 when the gate is high (always true for timers 0 and 1). When the count 86 83 reaches zero, the output goes high. 87 84 88 - Mode 1: Triggered One-shot. The output is initially set high. When the gate 85 + Mode 1: Triggered One-shot. 86 + The output is initially set high. When the gate 89 87 line is set high, a countdown is initiated (which does not stop if the gate is 90 88 lowered), during which the output is set low. When the count reaches zero, 91 89 the output goes high. 92 90 93 - Mode 2: Rate Generator. The output is initially set high. When the countdown 91 + Mode 2: Rate Generator. 92 + The output is initially set high. When the countdown 94 93 reaches 1, the output goes low for one count and then returns high. The value 95 94 is reloaded and the countdown automatically resumes. If the gate line goes 96 95 low, the count is halted. If the output is low when the gate is lowered, the 97 96 output automatically goes high (this only affects timer 2). 98 97 99 - Mode 3: Square Wave. This generates a high / low square wave. The count 98 + Mode 3: Square Wave. 99 + This generates a high / low square wave. The count 100 100 determines the length of the pulse, which alternates between high and low 101 101 when zero is reached. The count only proceeds when gate is high and is 102 102 automatically reloaded on reaching zero. The count is decremented twice at ··· 111 103 values are not observed when reading. This is the intended mode for timer 2, 112 104 which generates sine-like tones by low-pass filtering the square wave output. 113 105 114 - Mode 4: Software Strobe. After programming this mode and loading the counter, 106 + Mode 4: Software Strobe. 107 + After programming this mode and loading the counter, 115 108 the output remains high until the counter reaches zero. Then the output 116 109 goes low for 1 clock cycle and returns high. The counter is not reloaded. 117 110 Counting only occurs when gate is high. 118 111 119 - Mode 5: Hardware Strobe. After programming and loading the counter, the 112 + Mode 5: Hardware Strobe. 113 + After programming and loading the counter, the 120 114 output remains high. When the gate is raised, a countdown is initiated 121 115 (which does not stop if the gate is lowered). When the counter reaches zero, 122 116 the output goes low for 1 clock cycle and then returns high. The counter is ··· 128 118 command port, 0x43 is used to set the counter and mode for each of the three 129 119 timers. 130 120 131 - PIT commands, issued to port 0x43, using the following bit encoding: 121 + PIT commands, issued to port 0x43, using the following bit encoding:: 132 122 133 - Bit 7-4: Command (See table below) 134 - Bit 3-1: Mode (000 = Mode 0, 101 = Mode 5, 11X = undefined) 135 - Bit 0 : Binary (0) / BCD (1) 123 + Bit 7-4: Command (See table below) 124 + Bit 3-1: Mode (000 = Mode 0, 101 = Mode 5, 11X = undefined) 125 + Bit 0 : Binary (0) / BCD (1) 136 126 137 - Command table: 127 + Command table:: 138 128 139 - 0000 - Latch Timer 0 count for port 0x40 129 + 0000 - Latch Timer 0 count for port 0x40 140 130 sample and hold the count to be read in port 0x40; 141 131 additional commands ignored until counter is read; 142 132 mode bits ignored. 143 133 144 - 0001 - Set Timer 0 LSB mode for port 0x40 134 + 0001 - Set Timer 0 LSB mode for port 0x40 145 135 set timer to read LSB only and force MSB to zero; 146 136 mode bits set timer mode 147 137 148 - 0010 - Set Timer 0 MSB mode for port 0x40 138 + 0010 - Set Timer 0 MSB mode for port 0x40 149 139 set timer to read MSB only and force LSB to zero; 150 140 mode bits set timer mode 151 141 152 - 0011 - Set Timer 0 16-bit mode for port 0x40 142 + 0011 - Set Timer 0 16-bit mode for port 0x40 153 143 set timer to read / write LSB first, then MSB; 154 144 mode bits set timer mode 155 145 156 - 0100 - Latch Timer 1 count for port 0x41 - as described above 157 - 0101 - Set Timer 1 LSB mode for port 0x41 - as described above 158 - 0110 - Set Timer 1 MSB mode for port 0x41 - as described above 159 - 0111 - Set Timer 1 16-bit mode for port 0x41 - as described above 146 + 0100 - Latch Timer 1 count for port 0x41 - as described above 147 + 0101 - Set Timer 1 LSB mode for port 0x41 - as described above 148 + 0110 - Set Timer 1 MSB mode for port 0x41 - as described above 149 + 0111 - Set Timer 1 16-bit mode for port 0x41 - as described above 160 150 161 - 1000 - Latch Timer 2 count for port 0x42 - as described above 162 - 1001 - Set Timer 2 LSB mode for port 0x42 - as described above 163 - 1010 - Set Timer 2 MSB mode for port 0x42 - as described above 164 - 1011 - Set Timer 2 16-bit mode for port 0x42 as described above 151 + 1000 - Latch Timer 2 count for port 0x42 - as described above 152 + 1001 - Set Timer 2 LSB mode for port 0x42 - as described above 153 + 1010 - Set Timer 2 MSB mode for port 0x42 - as described above 154 + 1011 - Set Timer 2 16-bit mode for port 0x42 as described above 165 155 166 - 1101 - General counter latch 156 + 1101 - General counter latch 167 157 Latch combination of counters into corresponding ports 168 158 Bit 3 = Counter 2 169 159 Bit 2 = Counter 1 170 160 Bit 1 = Counter 0 171 161 Bit 0 = Unused 172 162 173 - 1110 - Latch timer status 163 + 1110 - Latch timer status 174 164 Latch combination of counter mode into corresponding ports 175 165 Bit 3 = Counter 2 176 166 Bit 2 = Counter 1 ··· 187 177 Bit 3-1 = Mode 188 178 Bit 0 = Binary (0) / BCD mode (1) 189 179 190 - 2.2) RTC 180 + 2.2. RTC 181 + -------- 191 182 192 183 The second device which was available in the original PC was the MC146818 real 193 184 time clock. The original device is now obsolete, and usually emulated by the ··· 212 201 The clock uses a 32.768kHz crystal, so bits 6-4 of register A should be 213 202 programmed to a 32kHz divider if the RTC is to count seconds. 214 203 215 - This is the RAM map originally used for the RTC/CMOS: 204 + This is the RAM map originally used for the RTC/CMOS:: 216 205 217 - Location Size Description 218 - ------------------------------------------ 219 - 00h byte Current second (BCD) 220 - 01h byte Seconds alarm (BCD) 221 - 02h byte Current minute (BCD) 222 - 03h byte Minutes alarm (BCD) 223 - 04h byte Current hour (BCD) 224 - 05h byte Hours alarm (BCD) 225 - 06h byte Current day of week (BCD) 226 - 07h byte Current day of month (BCD) 227 - 08h byte Current month (BCD) 228 - 09h byte Current year (BCD) 229 - 0Ah byte Register A 206 + Location Size Description 207 + ------------------------------------------ 208 + 00h byte Current second (BCD) 209 + 01h byte Seconds alarm (BCD) 210 + 02h byte Current minute (BCD) 211 + 03h byte Minutes alarm (BCD) 212 + 04h byte Current hour (BCD) 213 + 05h byte Hours alarm (BCD) 214 + 06h byte Current day of week (BCD) 215 + 07h byte Current day of month (BCD) 216 + 08h byte Current month (BCD) 217 + 09h byte Current year (BCD) 218 + 0Ah byte Register A 230 219 bit 7 = Update in progress 231 220 bit 6-4 = Divider for clock 232 221 000 = 4.194 MHz ··· 245 234 1101 = 125 mS 246 235 1110 = 250 mS 247 236 1111 = 500 mS 248 - 0Bh byte Register B 237 + 0Bh byte Register B 249 238 bit 7 = Run (0) / Halt (1) 250 239 bit 6 = Periodic interrupt enable 251 240 bit 5 = Alarm interrupt enable ··· 254 243 bit 2 = BCD calendar (0) / Binary (1) 255 244 bit 1 = 12-hour mode (0) / 24-hour mode (1) 256 245 bit 0 = 0 (DST off) / 1 (DST enabled) 257 - OCh byte Register C (read only) 246 + OCh byte Register C (read only) 258 247 bit 7 = interrupt request flag (IRQF) 259 248 bit 6 = periodic interrupt flag (PF) 260 249 bit 5 = alarm interrupt flag (AF) 261 250 bit 4 = update interrupt flag (UF) 262 251 bit 3-0 = reserved 263 - ODh byte Register D (read only) 252 + ODh byte Register D (read only) 264 253 bit 7 = RTC has power 265 254 bit 6-0 = reserved 266 - 32h byte Current century BCD (*) 255 + 32h byte Current century BCD (*) 267 256 (*) location vendor specific and now determined from ACPI global tables 268 257 269 - 2.3) APIC 258 + 2.3. APIC 259 + --------- 270 260 271 261 On Pentium and later processors, an on-board timer is available to each CPU 272 262 as part of the Advanced Programmable Interrupt Controller. The APIC is ··· 288 276 of one-shot or periodic operation, and is based on the bus clock divided down 289 277 by the programmable divider register. 290 278 291 - 2.4) HPET 279 + 2.4. HPET 280 + --------- 292 281 293 282 HPET is quite complex, and was originally intended to replace the PIT / RTC 294 283 support of the X86 PC. It remains to be seen whether that will be the case, as ··· 310 297 Detailed specification of the HPET is beyond the current scope of this 311 298 document, as it is also very well documented elsewhere. 312 299 313 - 2.5) Offboard Timers 300 + 2.5. Offboard Timers 301 + -------------------- 314 302 315 303 Several cards, both proprietary (watchdog boards) and commonplace (e1000) have 316 304 timing chips built into the cards which may have registers which are accessible ··· 321 307 timer device would require additional support to be virtualized properly and is 322 308 not considered important at this time as no known operating system does this. 323 309 324 - ========================================================================= 325 - 326 - 3) TSC Hardware 310 + 3. TSC Hardware 311 + =============== 327 312 328 313 The TSC or time stamp counter is relatively simple in theory; it counts 329 314 instruction cycles issued by the processor, which can be used as a measure of ··· 353 340 promise to allow the TSC to additionally be scaled, but this hardware is not 354 341 yet widely available. 355 342 356 - 3.1) TSC synchronization 343 + 3.1. TSC synchronization 344 + ------------------------ 357 345 358 346 The TSC is a CPU-local clock in most implementations. This means, on SMP 359 347 platforms, the TSCs of different CPUs may start at different times depending ··· 371 357 values are read from the same clock, which generally only is possible on single 372 358 socket systems or those with special hardware support. 373 359 374 - 3.2) TSC and CPU hotplug 360 + 3.2. TSC and CPU hotplug 361 + ------------------------ 375 362 376 363 As touched on already, CPUs which arrive later than the boot time of the system 377 364 may not have a TSC value that is synchronized with the rest of the system. ··· 382 367 TSC is synchronized back to a state where TSC synchronization flaws, however 383 368 small, may be exposed to the OS and any virtualization environment. 384 369 385 - 3.3) TSC and multi-socket / NUMA 370 + 3.3. TSC and multi-socket / NUMA 371 + -------------------------------- 386 372 387 373 Multi-socket systems, especially large multi-socket systems are likely to have 388 374 individual clocksources rather than a single, universally distributed clock. ··· 401 385 It is recommended not to trust the TSCs to remain synchronized on NUMA or 402 386 multiple socket systems for these reasons. 403 387 404 - 3.4) TSC and C-states 388 + 3.4. TSC and C-states 389 + --------------------- 405 390 406 391 C-states, or idling states of the processor, especially C1E and deeper sleep 407 392 states may be problematic for TSC as well. The TSC may stop advancing in such ··· 413 396 The TSC in such a case may be corrected by catching it up to a known external 414 397 clocksource. 415 398 416 - 3.5) TSC frequency change / P-states 399 + 3.5. TSC frequency change / P-states 400 + ------------------------------------ 417 401 418 402 To make things slightly more interesting, some CPUs may change frequency. They 419 403 may or may not run the TSC at the same rate, and because the frequency change ··· 434 416 than that of non-halted processors. AMD Turion processors are known to have 435 417 this problem. 436 418 437 - 3.6) TSC and STPCLK / T-states 419 + 3.6. TSC and STPCLK / T-states 420 + ------------------------------ 438 421 439 422 External signals given to the processor may also have the effect of stopping 440 423 the TSC. This is typically done for thermal emergency power control to prevent 441 424 an overheating condition, and typically, there is no way to detect that this 442 425 condition has happened. 443 426 444 - 3.7) TSC virtualization - VMX 427 + 3.7. TSC virtualization - VMX 428 + ----------------------------- 445 429 446 430 VMX provides conditional trapping of RDTSC, RDMSR, WRMSR and RDTSCP 447 431 instructions, which is enough for full virtualization of TSC in any manner. In ··· 451 431 field specified in the VMCS. Special instructions must be used to read and 452 432 write the VMCS field. 453 433 454 - 3.8) TSC virtualization - SVM 434 + 3.8. TSC virtualization - SVM 435 + ----------------------------- 455 436 456 437 SVM provides conditional trapping of RDTSC, RDMSR, WRMSR and RDTSCP 457 438 instructions, which is enough for full virtualization of TSC in any manner. In 458 439 addition, SVM allows passing through the host TSC plus an additional offset 459 440 field specified in the SVM control block. 460 441 461 - 3.9) TSC feature bits in Linux 442 + 3.9. TSC feature bits in Linux 443 + ------------------------------ 462 444 463 445 In summary, there is no way to guarantee the TSC remains in perfect 464 446 synchronization unless it is explicitly guaranteed by the architecture. Even ··· 470 448 The following feature bits are used by Linux to signal various TSC attributes, 471 449 but they can only be taken to be meaningful for UP or single node systems. 472 450 473 - X86_FEATURE_TSC : The TSC is available in hardware 474 - X86_FEATURE_RDTSCP : The RDTSCP instruction is available 475 - X86_FEATURE_CONSTANT_TSC : The TSC rate is unchanged with P-states 476 - X86_FEATURE_NONSTOP_TSC : The TSC does not stop in C-states 477 - X86_FEATURE_TSC_RELIABLE : TSC sync checks are skipped (VMware) 451 + ========================= ======================================= 452 + X86_FEATURE_TSC The TSC is available in hardware 453 + X86_FEATURE_RDTSCP The RDTSCP instruction is available 454 + X86_FEATURE_CONSTANT_TSC The TSC rate is unchanged with P-states 455 + X86_FEATURE_NONSTOP_TSC The TSC does not stop in C-states 456 + X86_FEATURE_TSC_RELIABLE TSC sync checks are skipped (VMware) 457 + ========================= ======================================= 478 458 479 - 4) Virtualization Problems 459 + 4. Virtualization Problems 460 + ========================== 480 461 481 462 Timekeeping is especially problematic for virtualization because a number of 482 463 challenges arise. The most obvious problem is that time is now shared between ··· 498 473 cause similar problems to virtualization makes it a good justification for 499 474 solving many of these problems on bare metal. 500 475 501 - 4.1) Interrupt clocking 476 + 4.1. Interrupt clocking 477 + ----------------------- 502 478 503 479 One of the most immediate problems that occurs with legacy operating systems 504 480 is that the system timekeeping routines are often designed to keep track of ··· 528 502 rate (ed: is it 18.2 Hz?) however that it has not yet been a problem in 529 503 practice. 530 504 531 - 4.2) TSC sampling and serialization 505 + 4.2. TSC sampling and serialization 506 + ----------------------------------- 532 507 533 508 As the highest precision time source available, the cycle counter of the CPU 534 509 has aroused much interest from developers. As explained above, this timer has ··· 551 524 the TSC as seen from other CPUs, even in an otherwise perfectly synchronized 552 525 system. 553 526 554 - 4.3) Timespec aliasing 527 + 4.3. Timespec aliasing 528 + ---------------------- 555 529 556 530 Additionally, this lack of serialization from the TSC poses another challenge 557 531 when using results of the TSC when measured against another time source. As ··· 576 548 and any other values derived from TSC computation (such as TSC virtualization 577 549 itself). 578 550 579 - 4.4) Migration 551 + 4.4. Migration 552 + -------------- 580 553 581 554 Migration of a virtual machine raises problems for timekeeping in two ways. 582 555 First, the migration itself may take time, during which interrupts cannot be ··· 595 566 simply storing multipliers and offsets against the TSC for the guest to convert 596 567 back into nanosecond resolution values. 597 568 598 - 4.5) Scheduling 569 + 4.5. Scheduling 570 + --------------- 599 571 600 572 Since scheduling may be based on precise timing and firing of interrupts, the 601 573 scheduling algorithms of an operating system may be adversely affected by ··· 609 579 paravirtualized scheduler clock, which reveals the true amount of CPU time for 610 580 which a virtual machine has been running. 611 581 612 - 4.6) Watchdogs 582 + 4.6. Watchdogs 583 + -------------- 613 584 614 585 Watchdog timers, such as the lock detector in Linux may fire accidentally when 615 586 running under hardware virtualization due to timer interrupts being delayed or ··· 618 587 spurious and can be ignored, but in some circumstances it may be necessary to 619 588 disable such detection. 620 589 621 - 4.7) Delays and precision timing 590 + 4.7. Delays and precision timing 591 + -------------------------------- 622 592 623 593 Precise timing and delays may not be possible in a virtualized system. This 624 594 can happen if the system is controlling physical hardware, or issues delays to ··· 632 600 significant issue. In many cases these delays may be eliminated through 633 601 configuration or paravirtualization. 634 602 635 - 4.8) Covert channels and leaks 603 + 4.8. Covert channels and leaks 604 + ------------------------------ 636 605 637 606 In addition to the above problems, time information will inevitably leak to the 638 607 guest about the host in anything but a perfect implementation of virtualized

+806 -992

Documentation/virt/uml/UserModeLinux-HOWTO.txt Documentation/virt/uml/user_mode_linux.rst

··· 1 - User Mode Linux HOWTO 2 - User Mode Linux Core Team 3 - Mon Nov 18 14:16:16 EST 2002 1 + .. SPDX-License-Identifier: GPL-2.0 4 2 5 - This document describes the use and abuse of Jeff Dike's User Mode 6 - Linux: a port of the Linux kernel as a normal Intel Linux process. 7 - ______________________________________________________________________ 3 + ===================== 4 + User Mode Linux HOWTO 5 + ===================== 8 6 9 - Table of Contents 7 + :Author: User Mode Linux Core Team 8 + :Last-updated: Sat Jan 25 16:07:55 CET 2020 9 + 10 + This document describes the use and abuse of Jeff Dike's User Mode 11 + Linux: a port of the Linux kernel as a normal Intel Linux process. 12 + 13 + 14 + .. Table of Contents 10 15 11 16 1. Introduction 12 17 ··· 137 132 15.5 Other contributions 138 133 139 134 140 - ______________________________________________________________________ 141 - 142 - 1. Introduction 135 + 1. Introduction 136 + ================ 143 137 144 138 Welcome to User Mode Linux. It's going to be fun. 145 139 146 140 147 141 148 - 1.1. How is User Mode Linux Different? 142 + 1.1. How is User Mode Linux Different? 143 + --------------------------------------- 149 144 150 145 Normally, the Linux Kernel talks straight to your hardware (video 151 146 card, keyboard, hard drives, etc), and any programs which run ask the 152 - kernel to operate the hardware, like so: 147 + kernel to operate the hardware, like so:: 153 148 154 149 155 150 ··· 165 160 166 161 167 162 The User Mode Linux Kernel is different; instead of talking to the 168 - hardware, it talks to a `real' Linux kernel (called the `host kernel' 163 + hardware, it talks to a `real` Linux kernel (called the `host kernel` 169 164 from now on), like any other program. Programs can then run inside 170 165 User-Mode Linux as if they were running under a normal kernel, like 171 - so: 166 + so:: 172 167 173 168 174 169 ··· 186 181 187 182 188 183 189 - 1.2. Why Would I Want User Mode Linux? 184 + 1.2. Why Would I Want User Mode Linux? 185 + --------------------------------------- 190 186 191 187 192 188 1. If User Mode Linux crashes, your host kernel is still fine. ··· 210 204 211 205 212 206 207 + .. _Compiling_the_kernel_and_modules: 208 + 209 + 2. Compiling the kernel and modules 210 + ==================================== 213 211 214 212 215 - 2. Compiling the kernel and modules 216 213 217 214 218 - 219 - 220 - 2.1. Compiling the kernel 215 + 2.1. Compiling the kernel 216 + -------------------------- 221 217 222 218 223 219 Compiling the user mode kernel is just like compiling any other 224 - kernel. Let's go through the steps, using 2.4.0-prerelease (current 225 - as of this writing) as an example: 220 + kernel. 226 221 227 222 228 - 1. Download the latest UML patch from 229 - 230 - the download page <http://user-mode-linux.sourceforge.net/ 231 - 232 - In this example, the file is uml-patch-2.4.0-prerelease.bz2. 233 - 234 - 235 - 2. Download the matching kernel from your favourite kernel mirror, 223 + 1. Download the latest kernel from your favourite kernel mirror, 236 224 such as: 237 225 238 - ftp://ftp.ca.kernel.org/pub/kernel/v2.4/linux-2.4.0-prerelease.tar.bz2 239 - <ftp://ftp.ca.kernel.org/pub/kernel/v2.4/linux-2.4.0-prerelease.tar.bz2> 240 - . 226 + https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-5.4.14.tar.xz 241 227 242 - 243 - 3. Make a directory and unpack the kernel into it. 244 - 245 - 228 + 2. Make a directory and unpack the kernel into it:: 246 229 247 230 host% 248 231 mkdir ~/uml 249 232 250 - 251 - 252 - 253 - 254 - 255 233 host% 256 234 cd ~/uml 257 235 258 - 259 - 260 - 261 - 262 - 263 236 host% 264 - tar -xzvf linux-2.4.0-prerelease.tar.bz2 237 + tar xvf linux-5.4.14.tar.xz 265 238 266 239 267 - 268 - 269 - 270 - 271 - 4. Apply the patch using 272 - 273 - 274 - 275 - host% 276 - cd ~/uml/linux 277 - 278 - 279 - 280 - host% 281 - bzcat uml-patch-2.4.0-prerelease.bz2 | patch -p1 282 - 283 - 284 - 285 - 286 - 287 - 288 - 5. Run your favorite config; `make xconfig ARCH=um' is the most 289 - convenient. `make config ARCH=um' and 'make menuconfig ARCH=um' 240 + 3. Run your favorite config; ``make xconfig ARCH=um`` is the most 241 + convenient. ``make config ARCH=um`` and ``make menuconfig ARCH=um`` 290 242 will work as well. The defaults will give you a useful kernel. If 291 243 you want to change something, go ahead, it probably won't hurt 292 244 anything. ··· 252 288 253 289 Note: If the host is configured with a 2G/2G address space split 254 290 rather than the usual 3G/1G split, then the packaged UML binaries 255 - will not run. They will immediately segfault. See ``UML on 2G/2G 256 - hosts'' for the scoop on running UML on your system. 291 + will not run. They will immediately segfault. See 292 + :ref:`UML_on_2G/2G_hosts` for the scoop on running UML on your system. 257 293 258 294 259 295 260 - 6. Finish with `make linux ARCH=um': the result is a file called 261 - `linux' in the top directory of your source tree. 262 - 263 - Make sure that you don't build this kernel in /usr/src/linux. On some 264 - distributions, /usr/include/asm is a link into this pool. The user- 265 - mode build changes the other end of that link, and things that include 266 - <asm/anything.h> stop compiling. 267 - 268 - The sources are also available from cvs at the project's cvs page, 269 - which has directions on getting the sources. You can also browse the 270 - CVS pool from there. 271 - 272 - If you get the CVS sources, you will have to check them out into an 273 - empty directory. You will then have to copy each file into the 274 - corresponding directory in the appropriate kernel pool. 275 - 276 - If you don't have the latest kernel pool, you can get the 277 - corresponding user-mode sources with 296 + 4. Finish with ``make linux ARCH=um``: the result is a file called 297 + ``linux`` in the top directory of your source tree. 278 298 279 299 280 - host% cvs co -r v_2_3_x linux 281 - 282 - 283 - 284 - 285 - where 'x' is the version in your pool. Note that you will not get the 286 - bug fixes and enhancements that have gone into subsequent releases. 287 - 288 - 289 - 2.2. Compiling and installing kernel modules 300 + 2.2. Compiling and installing kernel modules 301 + --------------------------------------------- 290 302 291 303 UML modules are built in the same way as the native kernel (with the 292 - exception of the 'ARCH=um' that you always need for UML): 304 + exception of the 'ARCH=um' that you always need for UML):: 293 305 294 306 295 307 host% make modules ARCH=um ··· 277 337 the user-mode pool. Modules from the native kernel won't work. 278 338 279 339 You can install them by using ftp or something to copy them into the 280 - virtual machine and dropping them into /lib/modules/`uname -r`. 340 + virtual machine and dropping them into ``/lib/modules/$(uname -r)``. 281 341 282 342 You can also get the kernel build process to install them as follows: 283 343 284 344 1. with the kernel not booted, mount the root filesystem in the top 285 - level of the kernel pool: 345 + level of the kernel pool:: 286 346 287 347 288 348 host% mount root_fs mnt -o loop ··· 292 352 293 353 294 354 295 - 2. run 355 + 2. run:: 296 356 297 357 298 358 host% ··· 303 363 304 364 305 365 306 - 3. unmount the filesystem 366 + 3. unmount the filesystem:: 307 367 308 368 309 369 host% umount mnt ··· 321 381 as modules, especially filesystems and network protocols and filters, 322 382 so most symbols which need to be exported probably already are. 323 383 However, if you do find symbols that need exporting, let us 324 - <http://user-mode-linux.sourceforge.net/> know, and 384 + know at http://user-mode-linux.sourceforge.net/, and 325 385 they'll be "taken care of". 326 386 327 387 328 388 329 - 2.3. Compiling and installing uml_utilities 389 + 2.3. Compiling and installing uml_utilities 390 + -------------------------------------------- 330 391 331 392 Many features of the UML kernel require a user-space helper program, 332 393 so a uml_utilities package is distributed separately from the kernel 333 394 patch which provides these helpers. Included within this is: 334 395 335 - o port-helper - Used by consoles which connect to xterms or ports 396 + - port-helper - Used by consoles which connect to xterms or ports 336 397 337 - o tunctl - Configuration tool to create and delete tap devices 398 + - tunctl - Configuration tool to create and delete tap devices 338 399 339 - o uml_net - Setuid binary for automatic tap device configuration 400 + - uml_net - Setuid binary for automatic tap device configuration 340 401 341 - o uml_switch - User-space virtual switch required for daemon 402 + - uml_switch - User-space virtual switch required for daemon 342 403 transport 343 404 344 - The uml_utilities tree is compiled with: 405 + The uml_utilities tree is compiled with:: 345 406 346 407 347 408 host# ··· 364 423 365 424 366 425 367 - 3. Running UML and logging in 426 + 3. Running UML and logging in 427 + ============================== 368 428 369 429 370 430 371 - 3.1. Running UML 431 + 3.1. Running UML 432 + ----------------- 372 433 373 - It runs on 2.2.15 or later, and all 2.4 kernels. 434 + It runs on 2.2.15 or later, and all kernel versions since 2.4. 374 435 375 436 376 437 Booting UML is straightforward. Simply run 'linux': it will try to 377 - mount the file `root_fs' in the current directory. You do not need to 378 - run it as root. If your root filesystem is not named `root_fs', then 379 - you need to put a `ubd0=root_fs_whatever' switch on the linux command 438 + mount the file ``root_fs`` in the current directory. You do not need to 439 + run it as root. If your root filesystem is not named ``root_fs``, then 440 + you need to put a ``ubd0=root_fs_whatever`` switch on the linux command 380 441 line. 381 442 382 443 383 444 You will need a filesystem to boot UML from. There are a number 384 - available for download from here <http://user-mode- 385 - linux.sourceforge.net/> . There are also several tools 386 - <http://user-mode-linux.sourceforge.net/> which can be 445 + available for download from http://user-mode-linux.sourceforge.net. 446 + There are also several tools at 447 + http://user-mode-linux.sourceforge.net/ which can be 387 448 used to generate UML-compatible filesystem images from media. 388 449 The kernel will boot up and present you with a login prompt. 389 450 390 451 391 - Note: If the host is configured with a 2G/2G address space split 452 + Note: 453 + If the host is configured with a 2G/2G address space split 392 454 rather than the usual 3G/1G split, then the packaged UML binaries will 393 - not run. They will immediately segfault. See ``UML on 2G/2G hosts'' 455 + not run. They will immediately segfault. See :ref:`UML_on_2G/2G_hosts` 394 456 for the scoop on running UML on your system. 395 457 396 458 397 459 398 - 3.2. Logging in 460 + 3.2. Logging in 461 + ---------------- 399 462 400 463 401 464 ··· 413 468 414 469 There are a couple of other ways to log in: 415 470 416 - o On a virtual console 471 + - On a virtual console 417 472 418 473 419 474 420 475 Each virtual console that is configured (i.e. the device exists in 421 476 /dev and /etc/inittab runs a getty on it) will come up in its own 422 - xterm. If you get tired of the xterms, read ``Setting up serial 423 - lines and consoles'' to see how to attach the consoles to 424 - something else, like host ptys. 477 + xterm. If you get tired of the xterms, read 478 + :ref:`setting_up_serial_lines_and_consoles` to see how to attach 479 + the consoles to something else, like host ptys. 425 480 426 481 427 482 428 - o Over the serial line 483 + - Over the serial line 429 484 430 485 431 - In the boot output, find a line that looks like: 486 + In the boot output, find a line that looks like:: 432 487 433 488 434 489 ··· 438 493 439 494 440 495 Attach your favorite terminal program to the corresponding tty. I.e. 441 - for minicom, the command would be 496 + for minicom, the command would be:: 442 497 443 498 444 499 host% minicom -o -p /dev/ttyp1 ··· 448 503 449 504 450 505 451 - o Over the net 506 + - Over the net 452 507 453 508 454 509 If the network is running, then you can telnet to the virtual 455 - machine and log in to it. See ``Setting up the network'' to learn 510 + machine and log in to it. See :ref:`Setting_up_the_network` to learn 456 511 about setting up a virtual network. 457 512 458 513 When you're done using it, run halt, and the kernel will bring itself 459 514 down and the process will exit. 460 515 461 516 462 - 3.3. Examples 517 + 3.3. Examples 518 + -------------- 463 519 464 520 Here are some examples of UML in action: 465 521 466 - o A login session <http://user-mode-linux.sourceforge.net/login.html> 522 + - A login session http://user-mode-linux.sourceforge.net/old/login.html 467 523 468 - o A virtual network <http://user-mode-linux.sourceforge.net/net.html> 469 - 470 - 524 + - A virtual network http://user-mode-linux.sourceforge.net/old/net.html 471 525 472 526 473 527 474 528 475 529 476 - 4. UML on 2G/2G hosts 530 + .. _UML_on_2G/2G_hosts: 531 + 532 + 4. UML on 2G/2G hosts 533 + ====================== 477 534 478 535 479 536 480 537 481 - 4.1. Introduction 538 + 4.1. Introduction 539 + ------------------ 482 540 483 541 484 542 Most Linux machines are configured so that the kernel occupies the ··· 494 546 495 547 496 548 497 - 4.2. The problem 549 + 4.2. The problem 550 + ----------------- 498 551 499 552 500 553 The prebuilt UML binaries on this site will not run on 2G/2G hosts ··· 507 558 508 559 509 560 510 - 4.3. The solution 561 + 4.3. The solution 562 + ------------------ 511 563 512 564 513 565 The fix for this is to rebuild UML from source after enabling 514 566 CONFIG_HOST_2G_2G (under 'General Setup'). This will cause UML to 515 567 load itself in the top .5G of that smaller process address space, 516 - where it will run fine. See ``Compiling the kernel and modules'' if 568 + where it will run fine. See :ref:`Compiling_the_kernel_and_modules` if 517 569 you need help building UML from source. 518 570 519 571 ··· 523 573 524 574 525 575 576 + .. _setting_up_serial_lines_and_consoles: 526 577 527 578 528 - 529 - 5. Setting up serial lines and consoles 579 + 5. Setting up serial lines and consoles 580 + ======================================== 530 581 531 582 532 583 It is possible to attach UML serial lines and consoles to many types ··· 535 584 536 585 537 586 You can attach them to host ptys, ttys, file descriptors, and ports. 538 - This allows you to do things like 587 + This allows you to do things like: 539 588 540 - o have a UML console appear on an unused host console, 589 + - have a UML console appear on an unused host console, 541 590 542 - o hook two virtual machines together by having one attach to a pty 591 + - hook two virtual machines together by having one attach to a pty 543 592 and having the other attach to the corresponding tty 544 593 545 - o make a virtual machine accessible from the net by attaching a 594 + - make a virtual machine accessible from the net by attaching a 546 595 console to a port on the host. 547 596 548 597 549 - The general format of the command line option is device=channel. 598 + The general format of the command line option is ``device=channel``. 550 599 551 600 552 601 553 - 5.1. Specifying the device 602 + 5.1. Specifying the device 603 + --------------------------- 554 604 555 605 Devices are specified with "con" or "ssl" (console or serial line, 556 606 respectively), optionally with a device number if you are talking ··· 565 613 566 614 A specific device name will override a less general "con=" or "ssl=". 567 615 So, for example, you can assign a pty to each of the serial lines 568 - except for the first two like this: 616 + except for the first two like this:: 569 617 570 618 571 619 ssl=pty ssl0=tty:/dev/tty0 ssl1=tty:/dev/tty1 ··· 578 626 579 627 580 628 581 - 5.2. Specifying the channel 629 + 5.2. Specifying the channel 630 + ---------------------------- 582 631 583 632 There are a number of different types of channels to attach a UML 584 633 device to, each with a different way of specifying exactly what to 585 634 attach to. 586 635 587 - o pseudo-terminals - device=pty pts terminals - device=pts 636 + - pseudo-terminals - device=pty pts terminals - device=pts 588 637 589 638 590 639 This will cause UML to allocate a free host pseudo-terminal for the ··· 593 640 log. You access it by attaching a terminal program to the 594 641 corresponding tty: 595 642 596 - o screen /dev/pts/n 643 + - screen /dev/pts/n 597 644 598 - o screen /dev/ttyxx 645 + - screen /dev/ttyxx 599 646 600 - o minicom -o -p /dev/ttyxx - minicom seems not able to handle pts 647 + - minicom -o -p /dev/ttyxx - minicom seems not able to handle pts 601 648 devices 602 649 603 - o kermit - start it up, 'open' the device, then 'connect' 650 + - kermit - start it up, 'open' the device, then 'connect' 604 651 605 652 606 653 607 654 608 655 609 - o terminals - device=tty:tty device file 656 + - terminals - device=tty:tty device file 610 657 611 658 612 - This will make UML attach the device to the specified tty (i.e 659 + This will make UML attach the device to the specified tty (i.e:: 613 660 614 661 615 662 con1=tty:/dev/tty3 ··· 625 672 626 673 627 674 628 - o xterms - device=xterm 675 + - xterms - device=xterm 629 676 630 677 631 678 UML will run an xterm and the device will be attached to it. ··· 634 681 635 682 636 683 637 - o Port - device=port:port number 684 + - Port - device=port:port number 638 685 639 686 640 687 This will attach the UML devices to the specified host port. 641 688 Attaching console 1 to the host's port 9000 would be done like 642 - this: 689 + this:: 643 690 644 691 645 692 con1=port:9000 ··· 647 694 648 695 649 696 650 - Attaching all the serial lines to that port would be done similarly: 697 + Attaching all the serial lines to that port would be done similarly:: 651 698 652 699 653 700 ssl=port:9000 ··· 655 702 656 703 657 704 658 - You access these devices by telnetting to that port. Each active tel- 659 - net session gets a different device. If there are more telnets to a 705 + You access these devices by telnetting to that port. Each active 706 + telnet session gets a different device. If there are more telnets to a 660 707 port than UML devices attached to it, then the extra telnet sessions 661 708 will block until an existing telnet detaches, or until another device 662 709 becomes active (i.e. by being activated in /etc/inittab). ··· 678 725 679 726 680 727 681 - o already-existing file descriptors - device=file descriptor 728 + - already-existing file descriptors - device=file descriptor 682 729 683 730 684 731 If you set up a file descriptor on the UML command line, you can 685 732 attach a UML device to it. This is most commonly used to put the 686 733 main console back on stdin and stdout after assigning all the other 687 - consoles to something else: 734 + consoles to something else:: 688 735 689 736 690 737 con0=fd:0,fd:1 con=pts ··· 696 743 697 744 698 745 699 - o Nothing - device=null 746 + - Nothing - device=null 700 747 701 748 702 749 This allows the device to be opened, in contrast to 'none', but ··· 707 754 708 755 709 756 710 - o None - device=none 757 + - None - device=none 711 758 712 759 713 760 This causes the device to disappear. ··· 715 762 716 763 717 764 You can also specify different input and output channels for a device 718 - by putting a comma between them: 765 + by putting a comma between them:: 719 766 720 767 721 768 ssl3=tty:/dev/tty2,xterm ··· 738 785 739 786 740 787 741 - 5.3. Examples 788 + 5.3. Examples 789 + -------------- 742 790 743 791 There are a number of interesting things you can do with this 744 792 capability. 745 793 746 794 747 795 First, this is how you get rid of those bleeding console xterms by 748 - attaching them to host ptys: 796 + attaching them to host ptys:: 749 797 750 798 751 799 con=pty con0=fd:0,fd:1 ··· 756 802 757 803 This will make a UML console take over an unused host virtual console, 758 804 so that when you switch to it, you will see the UML login prompt 759 - rather than the host login prompt: 805 + rather than the host login prompt:: 760 806 761 807 762 808 con1=tty:/dev/tty6 ··· 767 813 You can attach two virtual machines together with what amounts to a 768 814 serial line as follows: 769 815 770 - Run one UML with a serial line attached to a pty - 816 + Run one UML with a serial line attached to a pty:: 771 817 772 818 773 819 ssl1=pty ··· 779 825 that it got /dev/ptyp1). 780 826 781 827 Boot the other UML with a serial line attached to the corresponding 782 - tty - 828 + tty:: 783 829 784 830 785 831 ssl1=tty:/dev/ttyp1 ··· 792 838 prompt of the other virtual machine. 793 839 794 840 795 - 6. Setting up the network 841 + .. _setting_up_the_network: 842 + 843 + 6. Setting up the network 844 + ========================== 796 845 797 846 798 847 ··· 815 858 There are currently five transport types available for a UML virtual 816 859 machine to exchange packets with other hosts: 817 860 818 - o ethertap 861 + - ethertap 819 862 820 - o TUN/TAP 863 + - TUN/TAP 821 864 822 - o Multicast 865 + - Multicast 823 866 824 - o a switch daemon 867 + - a switch daemon 825 868 826 - o slip 869 + - slip 827 870 828 - o slirp 871 + - slirp 829 872 830 - o pcap 873 + - pcap 831 874 832 875 The TUN/TAP, ethertap, slip, and slirp transports allow a UML 833 876 instance to exchange packets with the host. They may be directed ··· 850 893 With so many host transports, which one should you use? Here's when 851 894 you should use each one: 852 895 853 - o ethertap - if you want access to the host networking and it is 896 + - ethertap - if you want access to the host networking and it is 854 897 running 2.2 855 898 856 - o TUN/TAP - if you want access to the host networking and it is 899 + - TUN/TAP - if you want access to the host networking and it is 857 900 running 2.4. Also, the TUN/TAP transport is able to use a 858 901 preconfigured device, allowing it to avoid using the setuid uml_net 859 902 helper, which is a security advantage. 860 903 861 - o Multicast - if you want a purely virtual network and you don't want 904 + - Multicast - if you want a purely virtual network and you don't want 862 905 to set up anything but the UML 863 906 864 - o a switch daemon - if you want a purely virtual network and you 907 + - a switch daemon - if you want a purely virtual network and you 865 908 don't mind running the daemon in order to get somewhat better 866 909 performance 867 910 868 - o slip - there is no particular reason to run the slip backend unless 911 + - slip - there is no particular reason to run the slip backend unless 869 912 ethertap and TUN/TAP are just not available for some reason 870 913 871 - o slirp - if you don't have root access on the host to setup 914 + - slirp - if you don't have root access on the host to setup 872 915 networking, or if you don't want to allocate an IP to your UML 873 916 874 - o pcap - not much use for actual network connectivity, but great for 917 + - pcap - not much use for actual network connectivity, but great for 875 918 monitoring traffic on the host 876 919 877 920 Ethertap is available on 2.4 and works fine. TUN/TAP is preferred ··· 883 926 exploit the helper's root privileges. 884 927 885 928 886 - 6.1. General setup 929 + 6.1. General setup 930 + ------------------- 887 931 888 932 First, you must have the virtual network enabled in your UML. If are 889 933 running a prebuilt kernel from this site, everything is already ··· 896 938 The next step is to provide a network device to the virtual machine. 897 939 This is done by describing it on the kernel command line. 898 940 899 - The general format is 941 + The general format is:: 900 942 901 943 902 944 eth <n> = <transport> , <transport args> ··· 905 947 906 948 907 949 For example, a virtual ethernet device may be attached to a host 908 - ethertap device as follows: 950 + ethertap device as follows:: 909 951 910 952 911 953 eth0=ethertap,tap0,fe:fd:0:0:0:1,192.168.0.254 ··· 936 978 937 979 938 980 You can also add devices to a UML and remove them at runtime. See the 939 - ``The Management Console'' page for details. 981 + :ref:`The_Management_Console` page for details. 940 982 941 983 942 984 The sections below describe this in more detail. ··· 953 995 954 996 955 997 956 - 6.2. Userspace daemons 998 + 6.2. Userspace daemons 999 + ----------------------- 957 1000 958 1001 You will likely need the setuid helper, or the switch daemon, or both. 959 1002 They are both installed with the RPM and deb, so if you've installed ··· 970 1011 971 1012 972 1013 973 - 6.3. Specifying ethernet addresses 1014 + 6.3. Specifying ethernet addresses 1015 + ----------------------------------- 974 1016 975 1017 Below, you will see that the TUN/TAP, ethertap, and daemon interfaces 976 1018 allow you to specify hardware addresses for the virtual ethernet ··· 983 1023 sufficient to guarantee a unique hardware address for the device. A 984 1024 couple of exceptions are: 985 1025 986 - o Another set of virtual ethernet devices are on the same network and 1026 + - Another set of virtual ethernet devices are on the same network and 987 1027 they are assigned hardware addresses using a different scheme which 988 1028 may conflict with the UML IP address-based scheme 989 1029 990 - o You aren't going to use the device for IP networking, so you don't 1030 + - You aren't going to use the device for IP networking, so you don't 991 1031 assign the device an IP address 992 1032 993 1033 If you let the driver provide the hardware address, you should make 994 1034 sure that the device IP address is known before the interface is 995 - brought up. So, inside UML, this will guarantee that: 1035 + brought up. So, inside UML, this will guarantee that:: 996 1036 997 1037 998 1038 999 - UML# 1000 - ifconfig eth0 192.168.0.250 up 1039 + UML# 1040 + ifconfig eth0 192.168.0.250 up 1001 1041 1002 1042 1003 1043 ··· 1009 1049 1010 1050 1011 1051 1012 - 6.4. UML interface setup 1052 + 6.4. UML interface setup 1053 + ------------------------- 1013 1054 1014 1055 Once the network devices have been described on the command line, you 1015 1056 should boot UML and log in. 1016 1057 1017 1058 1018 - The first thing to do is bring the interface up: 1059 + The first thing to do is bring the interface up:: 1019 1060 1020 1061 1021 1062 UML# ifconfig ethn ip-address up ··· 1028 1067 1029 1068 1030 1069 To reach the rest of the world, you should set a default route to the 1031 - host: 1070 + host:: 1032 1071 1033 1072 1034 1073 UML# route add default gw host ip ··· 1036 1075 1037 1076 1038 1077 1039 - Again, with host ip of 192.168.0.4: 1078 + Again, with host ip of 192.168.0.4:: 1040 1079 1041 1080 1042 1081 UML# route add default gw 192.168.0.4 ··· 1058 1097 Note: If you can't communicate with other hosts on your physical 1059 1098 ethernet, it's probably because of a network route that's 1060 1099 automatically set up. If you run 'route -n' and see a route that 1061 - looks like this: 1100 + looks like this:: 1062 1101 1063 1102 1064 1103 1065 1104 1066 - Destination Gateway Genmask Flags Metric Ref Use Iface 1067 - 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 1105 + Destination Gateway Genmask Flags Metric Ref Use Iface 1106 + 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 1068 1107 1069 1108 1070 1109 1071 1110 1072 1111 with a mask that's not 255.255.255.255, then replace it with a route 1073 - to your host: 1112 + to your host:: 1074 1113 1075 1114 1076 1115 UML# 1077 1116 route del -net 192.168.0.0 dev eth0 netmask 255.255.255.0 1078 - 1079 - 1080 - 1081 - 1082 1117 1083 1118 1084 1119 UML# ··· 1088 1131 1089 1132 1090 1133 1091 - 6.5. Multicast 1134 + 6.5. Multicast 1135 + --------------- 1092 1136 1093 1137 The simplest way to set up a virtual network between multiple UMLs is 1094 1138 to use the mcast transport. This was written by Harald Welte and is ··· 1100 1142 messages when you bring the device up inside UML. 1101 1143 1102 1144 1103 - To use it, run two UMLs with 1145 + To use it, run two UMLs with:: 1104 1146 1105 1147 1106 1148 eth0=mcast ··· 1109 1151 1110 1152 1111 1153 on their command lines. Log in, configure the ethernet device in each 1112 - machine with different IP addresses: 1154 + machine with different IP addresses:: 1113 1155 1114 1156 1115 1157 UML1# ifconfig eth0 192.168.0.254 1116 - 1117 - 1118 - 1119 - 1120 1158 1121 1159 1122 1160 UML2# ifconfig eth0 192.168.0.253 ··· 1122 1168 1123 1169 and they should be able to talk to each other. 1124 1170 1125 - The full set of command line options for this transport are 1171 + The full set of command line options for this transport are:: 1126 1172 1127 1173 1128 1174 ··· 1131 1177 1132 1178 1133 1179 1134 - 1135 - Harald's original README is here <http://user-mode-linux.source- 1136 - forge.net/> and explains these in detail, as well as 1137 - some other issues. 1138 - 1139 1180 There is also a related point-to-point only "ucast" transport. 1140 1181 This is useful when your network does not support multicast, and 1141 1182 all network connections are simple point to point links. 1142 1183 1143 - The full set of command line options for this transport are 1184 + The full set of command line options for this transport are:: 1144 1185 1145 1186 1146 1187 ethn=ucast,ethernet address,remote address,listen port,remote port ··· 1143 1194 1144 1195 1145 1196 1146 - 6.6. TUN/TAP with the uml_net helper 1197 + 6.6. TUN/TAP with the uml_net helper 1198 + ------------------------------------- 1147 1199 1148 1200 TUN/TAP is the preferred mechanism on 2.4 to exchange packets with the 1149 1201 host. The TUN/TAP backend has been in UML since 2.4.9-3um. ··· 1166 1216 kernel or as the tun.o module. 1167 1217 1168 1218 The format of the command line switch to attach a device to a TUN/TAP 1169 - device is 1219 + device is:: 1170 1220 1171 1221 1172 1222 eth <n> =tuntap,,, <IP address> ··· 1176 1226 1177 1227 For example, this argument will attach the UML's eth0 to the next 1178 1228 available tap device and assign an ethernet address to it based on its 1179 - IP address 1229 + IP address:: 1180 1230 1181 1231 1182 1232 eth0=tuntap,,,192.168.0.254 ··· 1197 1247 There are a couple potential problems with running the TUN/TAP 1198 1248 transport on a 2.4 host kernel 1199 1249 1200 - o TUN/TAP seems not to work on 2.4.3 and earlier. Upgrade the host 1250 + - TUN/TAP seems not to work on 2.4.3 and earlier. Upgrade the host 1201 1251 kernel or use the ethertap transport. 1202 1252 1203 - o With an upgraded kernel, TUN/TAP may fail with 1253 + - With an upgraded kernel, TUN/TAP may fail with:: 1204 1254 1205 1255 1206 1256 File descriptor in bad state ··· 1213 1263 make sure that /usr/src/linux points to the headers for the running 1214 1264 kernel. 1215 1265 1216 - These were pointed out by Tim Robinson <timro at trkr dot net> in 1217 - <http://www.geocrawler.com/> name="this uml- 1218 - user post"> . 1266 + These were pointed out by Tim Robinson <timro at trkr dot net> in the past. 1219 1267 1220 1268 1221 1269 1222 - 6.7. TUN/TAP with a preconfigured tap device 1270 + 6.7. TUN/TAP with a preconfigured tap device 1271 + --------------------------------------------- 1223 1272 1224 1273 If you prefer not to have UML use uml_net (which is somewhat 1225 1274 insecure), with UML 2.4.17-11, you can set up a TUN/TAP device ··· 1226 1277 there is no need for root assistance. Setting up the device is done 1227 1278 as follows: 1228 1279 1229 - o Create the device with tunctl (available from the UML utilities 1230 - tarball) 1280 + - Create the device with tunctl (available from the UML utilities 1281 + tarball):: 1231 1282 1232 1283 1233 1284 ··· 1240 1291 where uid is the user id or username that UML will be run as. This 1241 1292 will tell you what device was created. 1242 1293 1243 - o Configure the device IP (change IP addresses and device name to 1244 - suit) 1294 + - Configure the device IP (change IP addresses and device name to 1295 + suit):: 1245 1296 1246 1297 1247 1298 ··· 1252 1303 1253 1304 1254 1305 1255 - o Set up routing and arping if desired - this is my recipe, there are 1256 - other ways of doing the same thing 1306 + - Set up routing and arping if desired - this is my recipe, there are 1307 + other ways of doing the same thing:: 1257 1308 1258 1309 1259 1310 host# ··· 1262 1313 host# 1263 1314 route add -host 192.168.0.253 dev tap0 1264 1315 1265 - 1266 - 1267 - 1268 - 1269 - 1270 1316 host# 1271 1317 bash -c 'echo 1 > /proc/sys/net/ipv4/conf/tap0/proxy_arp' 1272 - 1273 - 1274 - 1275 - 1276 - 1277 1318 1278 1319 host# 1279 1320 arp -Ds 192.168.0.253 eth0 pub ··· 1277 1338 utility which reads the information from a config file and sets up 1278 1339 devices at boot time. 1279 1340 1280 - o Rather than using up two IPs and ARPing for one of them, you can 1341 + - Rather than using up two IPs and ARPing for one of them, you can 1281 1342 also provide direct access to your LAN by the UML by using a 1282 - bridge. 1343 + bridge:: 1283 1344 1284 1345 1285 1346 host# 1286 1347 brctl addbr br0 1287 1348 1288 1349 1289 - 1290 - 1291 - 1292 - 1293 1350 host# 1294 1351 ifconfig eth0 0.0.0.0 promisc up 1295 - 1296 - 1297 - 1298 - 1299 1352 1300 1353 1301 1354 host# 1302 1355 ifconfig tap0 0.0.0.0 promisc up 1303 1356 1304 1357 1305 - 1306 - 1307 - 1308 - 1309 1358 host# 1310 1359 ifconfig br0 192.168.0.1 netmask 255.255.255.0 up 1311 1360 1312 1361 1313 - 1314 - 1315 - 1316 - 1317 - 1318 - host# 1319 - brctl stp br0 off 1320 - 1321 - 1322 - 1323 - 1362 + host# 1363 + brctl stp br0 off 1324 1364 1325 1365 1326 1366 host# 1327 1367 brctl setfd br0 1 1328 1368 1329 1369 1330 - 1331 - 1332 - 1333 - 1334 1370 host# 1335 1371 brctl sethello br0 1 1336 1372 1337 1373 1338 - 1339 - 1340 - 1341 - 1342 1374 host# 1343 1375 brctl addif br0 eth0 1344 - 1345 - 1346 - 1347 - 1348 1376 1349 1377 1350 1378 host# ··· 1323 1417 Note that 'br0' should be setup using ifconfig with the existing IP 1324 1418 address of eth0, as eth0 no longer has its own IP. 1325 1419 1326 - o 1420 + - 1327 1421 1328 1422 1329 1423 Also, the /dev/net/tun device must be writable by the user running 1330 1424 UML in order for the UML to use the device that's been configured 1331 - for it. The simplest thing to do is 1425 + for it. The simplest thing to do is:: 1332 1426 1333 1427 1334 1428 host# chmod 666 /dev/net/tun ··· 1344 1438 devices and chgrp /dev/net/tun to that group with mode 664 or 660. 1345 1439 1346 1440 1347 - o Once the device is set up, run UML with 'eth0=tuntap,device name' 1441 + - Once the device is set up, run UML with 'eth0=tuntap,device name' 1348 1442 (i.e. 'eth0=tuntap,tap0') on the command line (or do it with the 1349 1443 mconsole config command). 1350 1444 1351 - o Bring the eth device up in UML and you're in business. 1445 + - Bring the eth device up in UML and you're in business. 1352 1446 1353 1447 If you don't want that tap device any more, you can make it non- 1354 - persistent with 1448 + persistent with:: 1355 1449 1356 1450 1357 1451 host# tunctl -d tap device ··· 1361 1455 1362 1456 Finally, tunctl has a -b (for brief mode) switch which causes it to 1363 1457 output only the name of the tap device it created. This makes it 1364 - suitable for capture by a script: 1458 + suitable for capture by a script:: 1365 1459 1366 1460 1367 1461 host# TAP=`tunctl -u 1000 -b` ··· 1371 1465 1372 1466 1373 1467 1374 - 6.8. Ethertap 1468 + 6.8. Ethertap 1469 + -------------- 1375 1470 1376 1471 Ethertap is the general mechanism on 2.2 for userspace processes to 1377 1472 exchange packets with the kernel. ··· 1380 1473 1381 1474 1382 1475 To use this transport, you need to describe the virtual network device 1383 - on the UML command line. The general format for this is 1476 + on the UML command line. The general format for this is:: 1384 1477 1385 1478 1386 1479 eth <n> =ethertap, <device> , <ethernet address> , <tap IP address> ··· 1388 1481 1389 1482 1390 1483 1391 - So, the previous example 1484 + So, the previous example:: 1392 1485 1393 1486 1394 1487 eth0=ethertap,tap0,fe:fd:0:0:0:1,192.168.0.254 ··· 1428 1521 1429 1522 If you want to set things up yourself, you need to make sure that the 1430 1523 appropriate /dev entry exists. If it doesn't, become root and create 1431 - it as follows: 1524 + it as follows:: 1432 1525 1433 1526 1434 1527 mknod /dev/tap <minor> c 36 <minor> + 16 ··· 1436 1529 1437 1530 1438 1531 1439 - For example, this is how to create /dev/tap0: 1532 + For example, this is how to create /dev/tap0:: 1440 1533 1441 1534 1442 1535 mknod /dev/tap0 c 36 0 + 16 ··· 1446 1539 1447 1540 You also need to make sure that the host kernel has ethertap support. 1448 1541 If ethertap is enabled as a module, you apparently need to insmod 1449 - ethertap once for each ethertap device you want to enable. So, 1542 + ethertap once for each ethertap device you want to enable. So,:: 1450 1543 1451 1544 1452 1545 host# ··· 1456 1549 1457 1550 1458 1551 will give you the tap0 interface. To get the tap1 interface, you need 1459 - to run 1552 + to run:: 1460 1553 1461 1554 1462 1555 host# ··· 1468 1561 1469 1562 1470 1563 1471 - 6.9. The switch daemon 1564 + 6.9. The switch daemon 1565 + ----------------------- 1472 1566 1473 1567 Note: This is the daemon formerly known as uml_router, but which was 1474 1568 renamed so the network weenies of the world would stop growling at me. ··· 1485 1577 sockets. 1486 1578 1487 1579 1488 - If you want it to listen on a different pair of sockets, use 1580 + If you want it to listen on a different pair of sockets, use:: 1489 1581 1490 1582 1491 1583 -unix control socket data socket ··· 1494 1586 1495 1587 1496 1588 1497 - If you want it to act as a hub rather than a switch, use 1589 + If you want it to act as a hub rather than a switch, use:: 1498 1590 1499 1591 1500 1592 -hub ··· 1504 1596 1505 1597 1506 1598 If you want the switch to be connected to host networking (allowing 1507 - the umls to get access to the outside world through the host), use 1599 + the umls to get access to the outside world through the host), use:: 1508 1600 1509 1601 1510 1602 -tap tap0 ··· 1518 1610 device than tap0, specify that instead of tap0. 1519 1611 1520 1612 1521 - uml_switch can be backgrounded as follows 1613 + uml_switch can be backgrounded as follows:: 1522 1614 1523 1615 1524 1616 host% ··· 1531 1623 stdin for EOF. When it sees that, it exits. 1532 1624 1533 1625 1534 - The general format of the kernel command line switch is 1626 + The general format of the kernel command line switch is:: 1535 1627 1536 1628 1537 1629 ··· 1547 1639 how to communicate with the daemon. You should only specify them if 1548 1640 you told the daemon to use different sockets than the default. So, if 1549 1641 you ran the daemon with no arguments, running the UML on the same 1550 - machine with 1642 + machine with:: 1643 + 1551 1644 eth0=daemon 1552 1645 1553 1646 ··· 1558 1649 1559 1650 1560 1651 1561 - 6.10. Slip 1652 + 6.10. Slip 1653 + ----------- 1562 1654 1563 1655 Slip is another, less general, mechanism for a process to communicate 1564 1656 with the host networking. In contrast to the ethertap interface, ··· 1568 1658 IP. 1569 1659 1570 1660 1571 - The general format of the command line switch is 1661 + The general format of the command line switch is:: 1572 1662 1573 1663 1574 1664 ··· 1591 1681 1592 1682 1593 1683 1594 - 6.11. Slirp 1684 + 6.11. Slirp 1685 + ------------ 1595 1686 1596 1687 slirp uses an external program, usually /usr/bin/slirp, to provide IP 1597 1688 only networking connectivity through the host. This is similar to IP ··· 1602 1691 root access or setuid binaries on the host. 1603 1692 1604 1693 1605 - The general format of the command line switch for slirp is: 1694 + The general format of the command line switch for slirp is:: 1606 1695 1607 1696 1608 1697 ··· 1627 1716 The eth0 interface on UML should be set up with the IP 10.2.0.15, 1628 1717 although you can use anything as long as it is not used by a network 1629 1718 you will be connecting to. The default route on UML should be set to 1630 - use 1719 + use:: 1631 1720 1632 1721 1633 1722 UML# ··· 1648 1737 1649 1738 1650 1739 1651 - 6.12. pcap 1740 + 6.12. pcap 1741 + ----------- 1652 1742 1653 1743 The pcap transport is attached to a UML ethernet device on the command 1654 - line or with uml_mconsole with the following syntax: 1744 + line or with uml_mconsole with the following syntax:: 1655 1745 1656 1746 1657 1747 ··· 1674 1762 expression optimizer is used. 1675 1763 1676 1764 1677 - Example: 1765 + Example:: 1678 1766 1679 1767 1680 1768 ··· 1689 1777 1690 1778 1691 1779 1692 - 6.13. Setting up the host yourself 1780 + 6.13. Setting up the host yourself 1781 + ----------------------------------- 1693 1782 1694 1783 If you don't specify an address for the host side of the ethertap or 1695 1784 slip device, UML won't do any setup on the host. So this is what is ··· 1698 1785 192.168.0.251 and a UML-side IP of 192.168.0.250 - adjust to suit your 1699 1786 own network): 1700 1787 1701 - o The device needs to be configured with its IP address. Tap devices 1788 + - The device needs to be configured with its IP address. Tap devices 1702 1789 are also configured with an mtu of 1484. Slip devices are 1703 1790 configured with a point-to-point address pointing at the UML ip 1704 - address. 1791 + address:: 1705 1792 1706 1793 1707 1794 host# ifconfig tap0 arp mtu 1484 192.168.0.251 up 1708 - 1709 - 1710 - 1711 - 1712 1795 1713 1796 1714 1797 host# ··· 1714 1805 1715 1806 1716 1807 1717 - o If a tap device is being set up, a route is set to the UML IP. 1808 + - If a tap device is being set up, a route is set to the UML IP:: 1718 1809 1719 1810 1720 1811 UML# route add -host 192.168.0.250 gw 192.168.0.251 ··· 1723 1814 1724 1815 1725 1816 1726 - o To allow other hosts on your network to see the virtual machine, 1727 - proxy arp is set up for it. 1817 + - To allow other hosts on your network to see the virtual machine, 1818 + proxy arp is set up for it:: 1728 1819 1729 1820 1730 1821 host# arp -Ds 192.168.0.250 eth0 pub ··· 1733 1824 1734 1825 1735 1826 1736 - o Finally, the host is set up to route packets. 1827 + - Finally, the host is set up to route packets:: 1737 1828 1738 1829 1739 1830 host# echo 1 > /proc/sys/net/ipv4/ip_forward ··· 1747 1838 1748 1839 1749 1840 1750 - 7. Sharing Filesystems between Virtual Machines 1841 + 7. Sharing Filesystems between Virtual Machines 1842 + ================================================ 1751 1843 1752 1844 1753 1845 1754 1846 1755 - 7.1. A warning 1847 + 7.1. A warning 1848 + --------------- 1756 1849 1757 1850 Don't attempt to share filesystems simply by booting two UMLs from the 1758 1851 same file. That's the same thing as booting two physical machines ··· 1762 1851 1763 1852 1764 1853 1765 - 7.2. Using layered block devices 1854 + 7.2. Using layered block devices 1855 + --------------------------------- 1766 1856 1767 1857 The way to share a filesystem between two virtual machines is to use 1768 1858 the copy-on-write (COW) layering capability of the ubd block driver. ··· 1784 1872 1785 1873 1786 1874 To add a copy-on-write layer to an existing block device file, simply 1787 - add the name of the COW file to the appropriate ubd switch: 1875 + add the name of the COW file to the appropriate ubd switch:: 1788 1876 1789 1877 1790 1878 ubd0=root_fs_cow,root_fs_debian_22 ··· 1795 1883 where 'root_fs_cow' is the private COW file and 'root_fs_debian_22' is 1796 1884 the existing shared filesystem. The COW file need not exist. If it 1797 1885 doesn't, the driver will create and initialize it. Once the COW file 1798 - has been initialized, it can be used on its own on the command line: 1886 + has been initialized, it can be used on its own on the command line:: 1799 1887 1800 1888 1801 1889 ubd0=root_fs_cow ··· 1808 1896 1809 1897 1810 1898 1811 - 7.3. Note! 1899 + 7.3. Note! 1900 + ----------- 1812 1901 1813 1902 When checking the size of the COW file in order to see the gobs of 1814 1903 space that you're saving, make sure you use 'ls -ls' to see the actual 1815 1904 disk consumption rather than the length of the file. The COW file is 1816 1905 sparse, so the length will be very different from the disk usage. 1817 1906 Here is a 'ls -l' of a COW file and backing file from one boot and 1818 - shutdown: 1907 + shutdown:: 1908 + 1819 1909 host% ls -l cow.debian debian2.2 1820 1910 -rw-r--r-- 1 jdike jdike 492504064 Aug 6 21:16 cow.debian 1821 1911 -rwxrw-rw- 1 jdike jdike 537919488 Aug 6 20:42 debian2.2 ··· 1825 1911 1826 1912 1827 1913 1828 - Doesn't look like much saved space, does it? Well, here's 'ls -ls': 1914 + Doesn't look like much saved space, does it? Well, here's 'ls -ls':: 1829 1915 1830 1916 1831 1917 host% ls -ls cow.debian debian2.2 ··· 1840 1926 1841 1927 1842 1928 1843 - 7.4. Another warning 1929 + 7.4. Another warning 1930 + --------------------- 1844 1931 1845 1932 Once a filesystem is being used as a readonly backing file for a COW 1846 1933 file, do not boot directly from it or modify it in any way. Doing so ··· 1867 1952 1868 1953 1869 1954 1870 - 7.5. uml_moo : Merging a COW file with its backing file 1955 + 7.5. uml_moo : Merging a COW file with its backing file 1956 + -------------------------------------------------------- 1871 1957 1872 1958 Depending on how you use UML and COW devices, it may be advisable to 1873 1959 merge the changes in the COW file into the backing file every once in ··· 1877 1961 1878 1962 1879 1963 1880 - The utility that does this is uml_moo. Its usage is 1964 + The utility that does this is uml_moo. Its usage is:: 1881 1965 1882 1966 1883 1967 host% uml_moo COW file new backing file ··· 1907 1991 1908 1992 uml_moo is installed with the UML deb and RPM. If you didn't install 1909 1993 UML from one of those packages, you can also get it from the UML 1910 - utilities <http://user-mode-linux.sourceforge.net/ 1911 - utilities> tar file in tools/moo. 1994 + utilities http://user-mode-linux.sourceforge.net/utilities tar file 1995 + in tools/moo. 1912 1996 1913 1997 1914 1998 ··· 1917 2001 1918 2002 1919 2003 1920 - 8. Creating filesystems 2004 + 8. Creating filesystems 2005 + ======================== 1921 2006 1922 2007 1923 2008 You may want to create and mount new UML filesystems, either because ··· 1932 2015 should be easy to translate to the filesystem of your choice. 1933 2016 1934 2017 1935 - 8.1. Create the filesystem file 2018 + 8.1. Create the filesystem file 2019 + ================================ 1936 2020 1937 2021 dd is your friend. All you need to do is tell dd to create an empty 1938 2022 file of the appropriate size. I usually make it sparse to save time 1939 2023 and to avoid allocating disk space until it's actually used. For 1940 2024 example, the following command will create a sparse 100 meg file full 1941 - of zeroes. 2025 + of zeroes:: 1942 2026 1943 2027 1944 2028 host% ··· 1952 2034 1953 2035 8.2. Assign the file to a UML device 1954 2036 1955 - Add an argument like the following to the UML command line: 2037 + Add an argument like the following to the UML command line:: 1956 2038 1957 - ubd4=new_filesystem 2039 + ubd4=new_filesystem 1958 2040 1959 2041 1960 2042 ··· 1971 2053 etc), then get them into UML by way of the net or hostfs. 1972 2054 1973 2055 1974 - Make the new filesystem on the device assigned to the new file: 2056 + Make the new filesystem on the device assigned to the new file:: 1975 2057 1976 2058 1977 2059 host# mkreiserfs /dev/ubd/4 ··· 1995 2077 1996 2078 1997 2079 1998 - Now, mount it: 2080 + Now, mount it:: 1999 2081 2000 2082 2001 2083 UML# ··· 2014 2096 2015 2097 2016 2098 2017 - 9. Host file access 2099 + 9. Host file access 2100 + ==================== 2018 2101 2019 2102 2020 2103 If you want to access files on the host machine from inside UML, you ··· 2031 2112 files contained in it just as you would on the host. 2032 2113 2033 2114 2034 - 9.1. Using hostfs 2115 + 9.1. Using hostfs 2116 + ------------------ 2035 2117 2036 2118 To begin with, make sure that hostfs is available inside the virtual 2037 - machine with 2119 + machine with:: 2038 2120 2039 2121 2040 2122 UML# cat /proc/filesystems ··· 2047 2127 module and available inside the virtual machine, and insmod it. 2048 2128 2049 2129 2050 - Now all you need to do is run mount: 2130 + Now all you need to do is run mount:: 2051 2131 2052 2132 2053 2133 UML# mount none /mnt/host -t hostfs ··· 2059 2139 2060 2140 2061 2141 If you don't want to mount the host root directory, then you can 2062 - specify a subdirectory to mount with the -o switch to mount: 2142 + specify a subdirectory to mount with the -o switch to mount:: 2063 2143 2064 2144 2065 2145 UML# mount none /mnt/home -t hostfs -o /home ··· 2071 2151 2072 2152 2073 2153 2074 - 9.2. hostfs as the root filesystem 2154 + 9.2. hostfs as the root filesystem 2155 + ----------------------------------- 2075 2156 2076 2157 It's possible to boot from a directory hierarchy on the host using 2077 2158 hostfs rather than using the standard filesystem in a file. 2078 2159 2079 2160 To start, you need that hierarchy. The easiest way is to loop mount 2080 - an existing root_fs file: 2161 + an existing root_fs file:: 2081 2162 2082 2163 2083 2164 host# mount root_fs uml_root_dir -o loop ··· 2087 2166 2088 2167 2089 2168 You need to change the filesystem type of / in etc/fstab to be 2090 - 'hostfs', so that line looks like this: 2169 + 'hostfs', so that line looks like this:: 2091 2170 2092 - /dev/ubd/0 / hostfs defaults 1 1 2171 + /dev/ubd/0 / hostfs defaults 1 1 2093 2172 2094 2173 2095 2174 2096 2175 2097 2176 Then you need to chown to yourself all the files in that directory 2098 - that are owned by root. This worked for me: 2177 + that are owned by root. This worked for me:: 2099 2178 2100 2179 2101 2180 host# find . -uid 0 -exec chown jdike {} \; ··· 2104 2183 2105 2184 2106 2185 Next, make sure that your UML kernel has hostfs compiled in, not as a 2107 - module. Then run UML with the boot device pointing at that directory: 2186 + module. Then run UML with the boot device pointing at that directory:: 2108 2187 2109 2188 2110 2189 ubd0=/path/to/uml/root/directory ··· 2115 2194 UML should then boot as it does normally. 2116 2195 2117 2196 2118 - 9.3. Building hostfs 2197 + 9.3. Building hostfs 2198 + --------------------- 2119 2199 2120 2200 If you need to build hostfs because it's not in your kernel, you have 2121 2201 two choices: 2122 2202 2123 2203 2124 2204 2125 - o Compiling hostfs into the kernel: 2205 + - Compiling hostfs into the kernel: 2126 2206 2127 2207 2128 2208 Reconfigure the kernel and set the 'Host filesystem' option under 2129 2209 2130 2210 2131 - o Compiling hostfs as a module: 2211 + - Compiling hostfs as a module: 2132 2212 2133 2213 2134 2214 Reconfigure the kernel and set the 'Host filesystem' option under 2135 2215 be in arch/um/fs/hostfs/hostfs.o. Install that in 2136 - /lib/modules/`uname -r`/fs in the virtual machine, boot it up, and 2216 + ``/lib/modules/$(uname -r)/fs`` in the virtual machine, boot it up, and:: 2137 2217 2138 2218 2139 2219 UML# insmod hostfs 2140 2220 2141 2221 2222 + .. _The_Management_Console: 2142 2223 2143 - 2144 - 2145 - 2146 - 2147 - 2148 - 2149 - 2150 - 2151 - 2152 - 10. The Management Console 2224 + 10. The Management Console 2225 + =========================== 2153 2226 2154 2227 2155 2228 ··· 2155 2240 2156 2241 There are a number of things you can do with the mconsole interface: 2157 2242 2158 - o get the kernel version 2243 + - get the kernel version 2159 2244 2160 - o add and remove devices 2245 + - add and remove devices 2161 2246 2162 - o halt or reboot the machine 2247 + - halt or reboot the machine 2163 2248 2164 - o Send SysRq commands 2249 + - Send SysRq commands 2165 2250 2166 - o Pause and resume the UML 2251 + - Pause and resume the UML 2167 2252 2168 2253 2169 2254 You need the mconsole client (uml_mconsole) which is present in CVS ··· 2172 2257 2173 2258 2174 2259 You also need CONFIG_MCONSOLE (under 'General Setup') enabled in UML. 2175 - When you boot UML, you'll see a line like: 2260 + When you boot UML, you'll see a line like:: 2176 2261 2177 2262 2178 2263 mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole ··· 2180 2265 2181 2266 2182 2267 2183 - If you specify a unique machine id one the UML command line, i.e. 2268 + If you specify a unique machine id one the UML command line, i.e.:: 2184 2269 2185 2270 2186 2271 umid=debian ··· 2188 2273 2189 2274 2190 2275 2191 - you'll see this 2276 + you'll see this:: 2192 2277 2193 2278 2194 2279 mconsole initialized on /home/jdike/.uml/debian/mconsole ··· 2197 2282 2198 2283 2199 2284 That file is the socket that uml_mconsole will use to communicate with 2200 - UML. Run it with either the umid or the full path as its argument: 2285 + UML. Run it with either the umid or the full path as its argument:: 2201 2286 2202 2287 2203 2288 host% uml_mconsole debian ··· 2205 2290 2206 2291 2207 2292 2208 - or 2293 + or:: 2209 2294 2210 2295 2211 2296 host% uml_mconsole /home/jdike/.uml/debian/mconsole ··· 2215 2300 2216 2301 You'll get a prompt, at which you can run one of these commands: 2217 2302 2218 - o version 2303 + - version 2219 2304 2220 - o halt 2305 + - halt 2221 2306 2222 - o reboot 2307 + - reboot 2223 2308 2224 - o config 2309 + - config 2225 2310 2226 - o remove 2311 + - remove 2227 2312 2228 - o sysrq 2313 + - sysrq 2229 2314 2230 - o help 2315 + - help 2231 2316 2232 - o cad 2317 + - cad 2233 2318 2234 - o stop 2319 + - stop 2235 2320 2236 - o go 2321 + - go 2237 2322 2238 2323 2239 - 10.1. version 2324 + 10.1. version 2325 + -------------- 2240 2326 2241 - This takes no arguments. It prints the UML version. 2327 + This takes no arguments. It prints the UML version:: 2242 2328 2243 2329 2244 2330 (mconsole) version ··· 2258 2342 2259 2343 2260 2344 2261 - 10.2. halt and reboot 2345 + 10.2. halt and reboot 2346 + ---------------------- 2262 2347 2263 2348 These take no arguments. They shut the machine down immediately, with 2264 2349 no syncing of disks and no clean shutdown of userspace. So, they are 2265 - pretty close to crashing the machine. 2350 + pretty close to crashing the machine:: 2266 2351 2267 2352 2268 2353 (mconsole) halt ··· 2274 2357 2275 2358 2276 2359 2277 - 10.3. config 2360 + 10.3. config 2361 + ------------- 2278 2362 2279 2363 "config" adds a new device to the virtual machine. Currently the ubd 2280 2364 and network drivers support this. It takes one argument, which is the 2281 - device to add, with the same syntax as the kernel command line. 2365 + device to add, with the same syntax as the kernel command line:: 2282 2366 2283 2367 2284 2368 2285 2369 2286 - (mconsole) 2287 - config ubd3=/home/jdike/incoming/roots/root_fs_debian22 2370 + (mconsole) 2371 + config ubd3=/home/jdike/incoming/roots/root_fs_debian22 2288 2372 2289 - OK 2290 - (mconsole) config eth1=mcast 2291 - OK 2292 - 2293 - 2373 + OK 2374 + (mconsole) config eth1=mcast 2375 + OK 2294 2376 2295 2377 2296 2378 2297 2379 2298 - 10.4. remove 2380 + 2381 + 2382 + 10.4. remove 2383 + ------------- 2299 2384 2300 2385 "remove" deletes a device from the system. Its argument is just the 2301 2386 name of the device to be removed. The device must be idle in whatever 2302 2387 sense the driver considers necessary. In the case of the ubd driver, 2303 2388 the removed block device must not be mounted, swapped on, or otherwise 2304 - open, and in the case of the network driver, the device must be down. 2389 + open, and in the case of the network driver, the device must be down:: 2305 2390 2306 2391 2307 2392 (mconsole) remove ubd3 ··· 2316 2397 2317 2398 2318 2399 2319 - 10.5. sysrq 2400 + 10.5. sysrq 2401 + ------------ 2320 2402 2321 2403 This takes one argument, which is a single letter. It calls the 2322 2404 generic kernel's SysRq driver, which does whatever is called for by ··· 2327 2407 2328 2408 2329 2409 2330 - 10.6. help 2410 + 10.6. help 2411 + ----------- 2331 2412 2332 2413 "help" returns a string listing the valid commands and what each one 2333 2414 does. 2334 2415 2335 2416 2336 2417 2337 - 10.7. cad 2418 + 10.7. cad 2419 + ---------- 2338 2420 2339 2421 This invokes the Ctl-Alt-Del action on init. What exactly this ends 2340 2422 up doing is up to /etc/inittab. Normally, it reboots the machine. 2341 2423 With UML, this is usually not desired, so if a halt would be better, 2342 - then find the section of inittab that looks like this 2424 + then find the section of inittab that looks like this:: 2343 2425 2344 2426 2345 2427 # What to do when CTRL-ALT-DEL is pressed. ··· 2354 2432 2355 2433 2356 2434 2357 - 10.8. stop 2435 + 10.8. stop 2436 + ----------- 2358 2437 2359 2438 This puts the UML in a loop reading mconsole requests until a 'go' 2360 2439 mconsole command is received. This is very useful for making backups ··· 2371 2448 2372 2449 2373 2450 2374 - 10.9. go 2451 + 10.9. go 2452 + --------- 2375 2453 2376 2454 This resumes a UML after being paused by a 'stop' command. Note that 2377 2455 when the UML has resumed, TCP connections may have timed out and if ··· 2384 2460 2385 2461 2386 2462 2463 + .. _Kernel_debugging: 2387 2464 2388 - 2389 - 11. Kernel debugging 2465 + 11. Kernel debugging 2466 + ===================== 2390 2467 2391 2468 2392 2469 Note: The interface that makes debugging, as described here, possible ··· 2402 2477 2403 2478 2404 2479 In order to debug the kernel, you need build it from source. See 2405 - ``Compiling the kernel and modules'' for information on doing that. 2480 + :ref:`Compiling_the_kernel_and_modules` for information on doing that. 2406 2481 Make sure that you enable CONFIG_DEBUGSYM and CONFIG_PT_PROXY during 2407 - the config. These will compile the kernel with -g, and enable the 2482 + the config. These will compile the kernel with ``-g``, and enable the 2408 2483 ptrace proxy so that gdb works with UML, respectively. 2409 2484 2410 2485 2411 2486 2412 2487 2413 - 11.1. Starting the kernel under gdb 2488 + 11.1. Starting the kernel under gdb 2489 + ------------------------------------ 2414 2490 2415 2491 You can have the kernel running under the control of gdb from the 2416 2492 beginning by putting 'debug' on the command line. You will get an ··· 2424 2498 There is a transcript of a debugging session here <debug- 2425 2499 session.html> , with breakpoints being set in the scheduler and in an 2426 2500 interrupt handler. 2427 - 11.2. Examining sleeping processes 2501 + 2502 + 2503 + 11.2. Examining sleeping processes 2504 + ----------------------------------- 2505 + 2428 2506 2429 2507 Not every bug is evident in the currently running process. Sometimes, 2430 2508 processes hang in the kernel when they shouldn't because they've ··· 2446 2516 2447 2517 Now what you do is this: 2448 2518 2449 - o detach from the current thread 2519 + - detach from the current thread:: 2450 2520 2451 2521 2452 2522 (UML gdb) det ··· 2455 2525 2456 2526 2457 2527 2458 - o attach to the thread you are interested in 2528 + - attach to the thread you are interested in:: 2459 2529 2460 2530 2461 2531 (UML gdb) att <host pid> ··· 2464 2534 2465 2535 2466 2536 2467 - o look at its stack and anything else of interest 2537 + - look at its stack and anything else of interest:: 2468 2538 2469 2539 2470 2540 (UML gdb) bt ··· 2475 2545 Note that you can't do anything at this point that requires that a 2476 2546 process execute, e.g. calling a function 2477 2547 2478 - o when you're done looking at that process, reattach to the current 2479 - thread and continue it 2548 + - when you're done looking at that process, reattach to the current 2549 + thread and continue it:: 2480 2550 2481 2551 2482 2552 (UML gdb) 2483 2553 att 1 2484 - 2485 - 2486 - 2487 - 2488 2554 2489 2555 2490 2556 (UML gdb) ··· 2495 2569 2496 2570 2497 2571 2498 - 11.3. Running ddd on UML 2572 + 11.3. Running ddd on UML 2573 + ------------------------- 2499 2574 2500 2575 ddd works on UML, but requires a special kludge. The process goes 2501 2576 like this: 2502 2577 2503 - o Start ddd 2578 + - Start ddd:: 2504 2579 2505 2580 2506 2581 host% ddd linux ··· 2510 2583 2511 2584 2512 2585 2513 - o With ps, get the pid of the gdb that ddd started. You can ask the 2586 + - With ps, get the pid of the gdb that ddd started. You can ask the 2514 2587 gdb to tell you, but for some reason that confuses things and 2515 2588 causes a hang. 2516 2589 2517 - o run UML with 'debug=parent gdb-pid=<pid>' added to the command line 2590 + - run UML with 'debug=parent gdb-pid=<pid>' added to the command line 2518 2591 - it will just sit there after you hit return 2519 2592 2520 - o type 'att 1' to the ddd gdb and you will see something like 2593 + - type 'att 1' to the ddd gdb and you will see something like:: 2521 2594 2522 2595 2523 2596 0xa013dc51 in __kill () ··· 2529 2602 2530 2603 2531 2604 2532 - o At this point, type 'c', UML will boot up, and you can use ddd just 2605 + - At this point, type 'c', UML will boot up, and you can use ddd just 2533 2606 as you do on any other process. 2534 2607 2535 2608 2536 2609 2537 - 11.4. Debugging modules 2610 + 11.4. Debugging modules 2611 + ------------------------ 2612 + 2538 2613 2539 2614 gdb has support for debugging code which is dynamically loaded into 2540 2615 the process. This support is what is needed to debug kernel modules ··· 2558 2629 2559 2630 2560 2631 First, you must tell it where your modules are. There is a list in 2561 - the script that looks like this: 2632 + the script that looks like this:: 2633 + 2562 2634 set MODULE_PATHS { 2563 2635 "fat" "/usr/src/uml/linux-2.4.18/fs/fat/fat.o" 2564 2636 "isofs" "/usr/src/uml/linux-2.4.18/fs/isofs/isofs.o" ··· 2571 2641 2572 2642 You change that to list the names and paths of the modules that you 2573 2643 are going to debug. Then you run it from the toplevel directory of 2574 - your UML pool and it basically tells you what to do: 2575 - 2576 - 2644 + your UML pool and it basically tells you what to do:: 2577 2645 2578 2646 2579 2647 ******** GDB pid is 21903 ******** ··· 2594 2666 2595 2667 2596 2668 After you run UML and it sits there doing nothing, you hit return at 2597 - the 'att 1' and continue it: 2669 + the 'att 1' and continue it:: 2598 2670 2599 2671 2600 2672 Attaching to program: /home/jdike/linux/2.4/um/./linux, process 1 ··· 2606 2678 2607 2679 2608 2680 At this point, you debug normally. When you insmod something, the 2609 - expect magic will kick in and you'll see something like: 2681 + expect magic will kick in and you'll see something like:: 2610 2682 2611 2683 2684 + *** Module hostfs loaded *** 2685 + Breakpoint 1, sys_init_module (name_user=0x805abb0 "hostfs", 2686 + mod_user=0x8070e00) at module.c:349 2687 + 349 char *name, *n_name, *name_tmp = NULL; 2688 + (UML gdb) finish 2689 + Run till exit from #0 sys_init_module (name_user=0x805abb0 "hostfs", 2690 + mod_user=0x8070e00) at module.c:349 2691 + 0xa00e2e23 in execute_syscall (r=0xa8140284) at syscall_kern.c:411 2692 + 411 else res = EXECUTE_SYSCALL(syscall, regs); 2693 + Value returned is $1 = 0 2694 + (UML gdb) 2695 + p/x (int)module_list + module_list->size_of_struct 2612 2696 2697 + $2 = 0xa9021054 2698 + (UML gdb) symbol-file ./linux 2699 + Load new symbol table from "./linux"? (y or n) y 2700 + Reading symbols from ./linux... 2701 + done. 2702 + (UML gdb) 2703 + add-symbol-file /home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o 0xa9021054 2613 2704 2705 + add symbol table from file "/home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o" at 2706 + .text_addr = 0xa9021054 2707 + (y or n) y 2614 2708 2615 - 2616 - 2617 - 2618 - 2619 - 2620 - 2621 - 2622 - 2623 - 2624 - 2625 - 2626 - 2627 - *** Module hostfs loaded *** 2628 - Breakpoint 1, sys_init_module (name_user=0x805abb0 "hostfs", 2629 - mod_user=0x8070e00) at module.c:349 2630 - 349 char *name, *n_name, *name_tmp = NULL; 2631 - (UML gdb) finish 2632 - Run till exit from #0 sys_init_module (name_user=0x805abb0 "hostfs", 2633 - mod_user=0x8070e00) at module.c:349 2634 - 0xa00e2e23 in execute_syscall (r=0xa8140284) at syscall_kern.c:411 2635 - 411 else res = EXECUTE_SYSCALL(syscall, regs); 2636 - Value returned is $1 = 0 2637 - (UML gdb) 2638 - p/x (int)module_list + module_list->size_of_struct 2639 - 2640 - $2 = 0xa9021054 2641 - (UML gdb) symbol-file ./linux 2642 - Load new symbol table from "./linux"? (y or n) y 2643 - Reading symbols from ./linux... 2644 - done. 2645 - (UML gdb) 2646 - add-symbol-file /home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o 0xa9021054 2647 - 2648 - add symbol table from file "/home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o" at 2649 - .text_addr = 0xa9021054 2650 - (y or n) y 2651 - 2652 - Reading symbols from /home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o... 2653 - done. 2654 - (UML gdb) p *module_list 2655 - $1 = {size_of_struct = 84, next = 0xa0178720, name = 0xa9022de0 "hostfs", 2656 - size = 9016, uc = {usecount = {counter = 0}, pad = 0}, flags = 1, 2657 - nsyms = 57, ndeps = 0, syms = 0xa9023170, deps = 0x0, refs = 0x0, 2658 - init = 0xa90221f0 <init_hostfs>, cleanup = 0xa902222c <exit_hostfs>, 2659 - ex_table_start = 0x0, ex_table_end = 0x0, persist_start = 0x0, 2660 - persist_end = 0x0, can_unload = 0, runsize = 0, kallsyms_start = 0x0, 2661 - kallsyms_end = 0x0, 2662 - archdata_start = 0x1b855 <Address 0x1b855 out of bounds>, 2663 - archdata_end = 0xe5890000 <Address 0xe5890000 out of bounds>, 2664 - kernel_data = 0xf689c35d <Address 0xf689c35d out of bounds>} 2665 - >> Finished loading symbols for hostfs ... 2709 + Reading symbols from /home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o... 2710 + done. 2711 + (UML gdb) p *module_list 2712 + $1 = {size_of_struct = 84, next = 0xa0178720, name = 0xa9022de0 "hostfs", 2713 + size = 9016, uc = {usecount = {counter = 0}, pad = 0}, flags = 1, 2714 + nsyms = 57, ndeps = 0, syms = 0xa9023170, deps = 0x0, refs = 0x0, 2715 + init = 0xa90221f0 <init_hostfs>, cleanup = 0xa902222c <exit_hostfs>, 2716 + ex_table_start = 0x0, ex_table_end = 0x0, persist_start = 0x0, 2717 + persist_end = 0x0, can_unload = 0, runsize = 0, kallsyms_start = 0x0, 2718 + kallsyms_end = 0x0, 2719 + archdata_start = 0x1b855 <Address 0x1b855 out of bounds>, 2720 + archdata_end = 0xe5890000 <Address 0xe5890000 out of bounds>, 2721 + kernel_data = 0xf689c35d <Address 0xf689c35d out of bounds>} 2722 + >> Finished loading symbols for hostfs ... 2666 2723 2667 2724 2668 2725 ··· 2657 2744 2658 2745 2659 2746 Boot the kernel under the debugger and load the module with insmod or 2660 - modprobe. With gdb, do: 2747 + modprobe. With gdb, do:: 2661 2748 2662 2749 2663 2750 (UML gdb) p module_list ··· 2671 2758 the name fields until find the module you want to debug. Take the 2672 2759 address of that structure, and add module.size_of_struct (which in 2673 2760 2.4.10 kernels is 96 (0x60)) to it. Gdb can make this hard addition 2674 - for you :-): 2761 + for you :-):: 2675 2762 2676 2763 2677 2764 2678 - (UML gdb) 2679 - printf "%#x\n", (int)module_list module_list->size_of_struct 2765 + (UML gdb) 2766 + printf "%#x\n", (int)module_list module_list->size_of_struct 2680 2767 2681 2768 2682 2769 ··· 2684 2771 The offset from the module start occasionally changes (before 2.4.0, 2685 2772 it was module.size_of_struct + 4), so it's a good idea to check the 2686 2773 init and cleanup addresses once in a while, as describe below. Now 2687 - do: 2774 + do:: 2688 2775 2689 2776 2690 2777 (UML gdb) ··· 2699 2786 If there's any doubt that you got the offset right, like breakpoints 2700 2787 appear not to work, or they're appearing in the wrong place, you can 2701 2788 check it by looking at the module structure. The init and cleanup 2702 - fields should look like: 2789 + fields should look like:: 2703 2790 2704 2791 2705 2792 init = 0x588066b0 <init_hostfs>, cleanup = 0x588066c0 <exit_hostfs> ··· 2714 2801 2715 2802 When you want to load in a new version of the module, you need to get 2716 2803 gdb to forget about the old one. The only way I've found to do that 2717 - is to tell gdb to forget about all symbols that it knows about: 2804 + is to tell gdb to forget about all symbols that it knows about:: 2718 2805 2719 2806 2720 2807 (UML gdb) symbol-file ··· 2722 2809 2723 2810 2724 2811 2725 - Then reload the symbols from the kernel binary: 2812 + Then reload the symbols from the kernel binary:: 2726 2813 2727 2814 2728 2815 (UML gdb) symbol-file /path/to/kernel ··· 2736 2823 2737 2824 2738 2825 2739 - 11.5. Attaching gdb to the kernel 2826 + 11.5. Attaching gdb to the kernel 2827 + ---------------------------------- 2740 2828 2741 2829 If you don't have the kernel running under gdb, you can attach gdb to 2742 2830 it later by sending the tracing thread a SIGUSR1. The first line of 2743 - the console output identifies its pid: 2831 + the console output identifies its pid:: 2832 + 2744 2833 tracing thread pid = 20093 2745 2834 2746 2835 2747 2836 2748 2837 2749 - When you send it the signal: 2838 + When you send it the signal:: 2750 2839 2751 2840 2752 2841 host% kill -USR1 20093 ··· 2760 2845 2761 2846 2762 2847 If you have the mconsole compiled into UML, then the mconsole client 2763 - can be used to start gdb: 2848 + can be used to start gdb:: 2764 2849 2765 2850 2766 2851 (mconsole) (mconsole) config gdb=xterm ··· 2772 2857 2773 2858 2774 2859 2775 - 11.6. Using alternate debuggers 2860 + 11.6. Using alternate debuggers 2861 + -------------------------------- 2776 2862 2777 2863 UML has support for attaching to an already running debugger rather 2778 2864 than starting gdb itself. This is present in CVS as of 17 Apr 2001. ··· 2802 2886 An example of an alternate debugger is strace. You can strace the 2803 2887 actual kernel as follows: 2804 2888 2805 - o Run the following in a shell 2889 + - Run the following in a shell:: 2806 2890 2807 2891 2808 2892 host% ··· 2810 2894 2811 2895 2812 2896 2813 - o Run UML with 'debug' and 'gdb-pid=<pid>' with the pid printed out 2897 + - Run UML with 'debug' and 'gdb-pid=<pid>' with the pid printed out 2814 2898 by the previous command 2815 2899 2816 - o Hit return in the shell, and UML will start running, and strace 2900 + - Hit return in the shell, and UML will start running, and strace 2817 2901 output will start accumulating in the output file. 2818 2902 2819 - Note that this is different from running 2903 + Note that this is different from running:: 2820 2904 2821 2905 2822 2906 host% strace ./linux ··· 2833 2917 2834 2918 2835 2919 2836 - 12. Kernel debugging examples 2920 + 12. Kernel debugging examples 2921 + ============================== 2837 2922 2838 - 12.1. The case of the hung fsck 2923 + 12.1. The case of the hung fsck 2924 + -------------------------------- 2839 2925 2840 2926 When booting up the kernel, fsck failed, and dropped me into a shell 2841 - to fix things up. I ran fsck -y, which hung: 2927 + to fix things up. I ran fsck -y, which hung:: 2842 2928 2843 2929 2930 + Setting hostname uml [ OK ] 2931 + Checking root filesystem 2932 + /dev/fhd0 was not cleanly unmounted, check forced. 2933 + Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. 2844 2934 2935 + /dev/fhd0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. 2936 + (i.e., without -a or -p options) 2937 + [ FAILED ] 2845 2938 2939 + *** An error occurred during the file system check. 2940 + *** Dropping you to a shell; the system will reboot 2941 + *** when you leave the shell. 2942 + Give root password for maintenance 2943 + (or type Control-D for normal startup): 2846 2944 2945 + [root@uml /root]# fsck -y /dev/fhd0 2946 + fsck -y /dev/fhd0 2947 + Parallelizing fsck version 1.14 (9-Jan-1999) 2948 + e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09 2949 + /dev/fhd0 contains a file system with errors, check forced. 2950 + Pass 1: Checking inodes, blocks, and sizes 2951 + Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. Ignore error? yes 2847 2952 2953 + Inode 19780, i_blocks is 1548, should be 540. Fix? yes 2848 2954 2955 + Pass 2: Checking directory structure 2956 + Error reading block 49405 (Attempt to read block from filesystem resulted in short read). Ignore error? yes 2849 2957 2958 + Directory inode 11858, block 0, offset 0: directory corrupted 2959 + Salvage? yes 2850 2960 2961 + Missing '.' in directory inode 11858. 2962 + Fix? yes 2851 2963 2852 - 2853 - 2854 - 2855 - 2856 - 2857 - 2858 - 2859 - 2860 - 2861 - 2862 - 2863 - 2864 - 2865 - 2866 - 2867 - 2868 - 2869 - 2870 - 2871 - 2872 - 2873 - 2874 - 2875 - 2876 - 2877 - 2878 - 2879 - Setting hostname uml [ OK ] 2880 - Checking root filesystem 2881 - /dev/fhd0 was not cleanly unmounted, check forced. 2882 - Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. 2883 - 2884 - /dev/fhd0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. 2885 - (i.e., without -a or -p options) 2886 - [ FAILED ] 2887 - 2888 - *** An error occurred during the file system check. 2889 - *** Dropping you to a shell; the system will reboot 2890 - *** when you leave the shell. 2891 - Give root password for maintenance 2892 - (or type Control-D for normal startup): 2893 - 2894 - [root@uml /root]# fsck -y /dev/fhd0 2895 - fsck -y /dev/fhd0 2896 - Parallelizing fsck version 1.14 (9-Jan-1999) 2897 - e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09 2898 - /dev/fhd0 contains a file system with errors, check forced. 2899 - Pass 1: Checking inodes, blocks, and sizes 2900 - Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. Ignore error? yes 2901 - 2902 - Inode 19780, i_blocks is 1548, should be 540. Fix? yes 2903 - 2904 - Pass 2: Checking directory structure 2905 - Error reading block 49405 (Attempt to read block from filesystem resulted in short read). Ignore error? yes 2906 - 2907 - Directory inode 11858, block 0, offset 0: directory corrupted 2908 - Salvage? yes 2909 - 2910 - Missing '.' in directory inode 11858. 2911 - Fix? yes 2912 - 2913 - Missing '..' in directory inode 11858. 2914 - Fix? yes 2915 - 2916 - 2917 - 2964 + Missing '..' in directory inode 11858. 2965 + Fix? yes 2918 2966 2919 2967 2920 2968 The standard drill in this sort of situation is to fire up gdb on the 2921 2969 signal thread, which, in this case, was pid 1935. In another window, 2922 - I run gdb and attach pid 1935. 2923 - 2924 - 2970 + I run gdb and attach pid 1935:: 2925 2971 2926 2972 2927 2973 ~/linux/2.3.26/um 1016: gdb linux ··· 2900 3022 0x100756d9 in __wait4 () 2901 3023 2902 3024 2903 - 2904 - 2905 - 2906 - 2907 - Let's see what's currently running: 3025 + Let's see what's currently running:: 2908 3026 2909 3027 2910 3028 ··· 2915 3041 reason and never woke up. 2916 3042 2917 3043 2918 - Let's guess that the last process in the process list is fsck: 3044 + Let's guess that the last process in the process list is fsck:: 2919 3045 2920 3046 2921 3047 ··· 2926 3052 2927 3053 2928 3054 2929 - It is, so let's see what it thinks it's up to: 3055 + It is, so let's see what it thinks it's up to:: 2930 3056 2931 3057 2932 3058 ··· 2942 3068 2943 3069 2944 3070 2945 - 2946 - 2947 3071 The interesting things here are the fact that its .thread.syscall.id 2948 3072 is __NR_write (see the big switch in arch/um/kernel/syscall_kern.c or 2949 3073 the defines in include/asm-um/arch/unistd.h), and that it never ··· 2953 3081 The fact that it never returned from write means that its stack should 2954 3082 be fairly interesting. Its pid is 1980 (.thread.extern_pid). That 2955 3083 process is being ptraced by the signal thread, so it must be detached 2956 - before gdb can attach it: 3084 + before gdb can attach it:: 2957 3085 2958 3086 2959 3087 3088 + (gdb) call detach(1980) 2960 3089 2961 - 2962 - 2963 - 2964 - 2965 - 2966 - 2967 - (gdb) call detach(1980) 2968 - 2969 - Program received signal SIGSEGV, Segmentation fault. 2970 - <function called from gdb> 2971 - The program being debugged stopped while in a function called from GDB. 2972 - When the function (detach) is done executing, GDB will silently 2973 - stop (instead of continuing to evaluate the expression containing 2974 - the function call). 2975 - (gdb) call detach(1980) 2976 - $15 = 0 2977 - 2978 - 2979 - 3090 + Program received signal SIGSEGV, Segmentation fault. 3091 + <function called from gdb> 3092 + The program being debugged stopped while in a function called from GDB. 3093 + When the function (detach) is done executing, GDB will silently 3094 + stop (instead of continuing to evaluate the expression containing 3095 + the function call). 3096 + (gdb) call detach(1980) 3097 + $15 = 0 2980 3098 2981 3099 2982 3100 The first detach segfaults for some reason, and the second one ··· 2974 3112 2975 3113 2976 3114 Now I detach from the signal thread, attach to the fsck thread, and 2977 - look at its stack: 3115 + look at its stack:: 2978 3116 2979 3117 2980 3118 (gdb) det ··· 3014 3152 3015 3153 3016 3154 3017 - The interesting things here are : 3155 + The interesting things here are: 3018 3156 3019 - o There are two segfaults on this stack (frames 9 and 14) 3157 + - There are two segfaults on this stack (frames 9 and 14) 3020 3158 3021 - o The first faulting address (frame 11) is 0x50000800 3159 + - The first faulting address (frame 11) is 0x50000800:: 3022 3160 3023 - (gdb) p (void *)1342179328 3024 - $16 = (void *) 0x50000800 3161 + (gdb) p (void *)1342179328 3162 + $16 = (void *) 0x50000800 3025 3163 3026 3164 3027 3165 ··· 3037 3175 3038 3176 However, the more immediate problem is that second segfault and I'm 3039 3177 going to concentrate on that. First, I want to see where the fault 3040 - happened, so I have to go look at the sigcontent struct in frame 8: 3178 + happened, so I have to go look at the sigcontent struct in frame 8:: 3041 3179 3042 3180 3043 3181 ··· 3073 3211 3074 3212 3075 3213 3076 - That's not very useful, so I'll try a more manual method: 3214 + That's not very useful, so I'll try a more manual method:: 3077 3215 3078 3216 3079 3217 (gdb) p *((struct sigcontext *) (&sig + 1)) ··· 3086 3224 3087 3225 3088 3226 3089 - The ip is in handle_mm_fault: 3227 + The ip is in handle_mm_fault:: 3090 3228 3091 3229 3092 3230 (gdb) p (void *)268480945 ··· 3098 3236 3099 3237 3100 3238 3101 - Specifically, it's in pte_alloc: 3239 + Specifically, it's in pte_alloc:: 3102 3240 3103 3241 3104 3242 (gdb) i line *$20 ··· 3111 3249 3112 3250 3113 3251 To find where in handle_mm_fault this is, I'll jump forward in the 3114 - code until I see an address in that procedure: 3252 + code until I see an address in that procedure:: 3115 3253 3116 3254 3117 3255 ··· 3148 3286 3149 3287 3150 3288 Something is apparently wrong with the page tables or vma_structs, so 3151 - lets go back to frame 11 and have a look at them: 3289 + lets go back to frame 11 and have a look at them:: 3152 3290 3153 3291 3154 3292 3155 - #11 0x1006c0aa in segv (address=1342179328, is_write=2) at trap_kern.c:50 3156 - 50 handle_mm_fault(current, vma, address, is_write); 3157 - (gdb) call pgd_offset_proc(vma->vm_mm, address) 3158 - $22 = (pgd_t *) 0x80a548c 3293 + #11 0x1006c0aa in segv (address=1342179328, is_write=2) at trap_kern.c:50 3294 + 50 handle_mm_fault(current, vma, address, is_write); 3295 + (gdb) call pgd_offset_proc(vma->vm_mm, address) 3296 + $22 = (pgd_t *) 0x80a548c 3159 3297 3160 3298 3161 3299 3162 3300 3163 3301 3164 3302 That's pretty bogus. Page tables aren't supposed to be in process 3165 - text or data areas. Let's see what's in the vma: 3303 + text or data areas. Let's see what's in the vma:: 3166 3304 3167 3305 3168 3306 (gdb) p *vma ··· 3187 3325 3188 3326 3189 3327 3190 - 3191 - 3192 3328 This also pretty bogus. With all of the 0x80xxxxx and 0xaffffxxx 3193 3329 addresses, this is looking like a stack was plonked down on top of 3194 - these structures. Maybe it's a stack overflow from the next page: 3195 - 3330 + these structures. Maybe it's a stack overflow from the next page:: 3196 3331 3197 3332 3198 3333 (gdb) p vma ··· 3197 3338 3198 3339 3199 3340 3200 - 3201 - 3202 3341 That's towards the lower quarter of the page, so that would have to 3203 - have been pretty heavy stack overflow: 3342 + have been pretty heavy stack overflow:: 3204 3343 3205 3344 3206 - 3207 - 3208 - 3209 - 3210 - 3211 - 3212 - 3213 - 3214 - 3215 - 3216 - 3217 - 3218 - (gdb) x/100x $25 3219 - 0x507d2434: 0x507d2434 0x00000000 0x08048000 0x080a4f8c 3220 - 0x507d2444: 0x00000000 0x080a79e0 0x080a8c94 0x080d1000 3221 - 0x507d2454: 0xaffffdb0 0xaffffe63 0xaffffe7a 0xaffffe7a 3222 - 0x507d2464: 0xafffffec 0x00000062 0x0000008a 0x00000000 3223 - 0x507d2474: 0x00000000 0x00000000 0x00000000 0x00000000 3224 - 0x507d2484: 0x00000000 0x00000000 0x00000000 0x00000000 3225 - 0x507d2494: 0x00000000 0x00000000 0x507d2fe0 0x00000000 3226 - 0x507d24a4: 0x00000000 0x00000000 0x00000000 0x00000000 3227 - 0x507d24b4: 0x00000000 0x00000000 0x00000000 0x00000000 3228 - 0x507d24c4: 0x00000000 0x00000000 0x00000000 0x00000000 3229 - 0x507d24d4: 0x00000000 0x00000000 0x00000000 0x00000000 3230 - 0x507d24e4: 0x00000000 0x00000000 0x00000000 0x00000000 3231 - 0x507d24f4: 0x00000000 0x00000000 0x00000000 0x00000000 3232 - 0x507d2504: 0x00000000 0x00000000 0x00000000 0x00000000 3233 - 0x507d2514: 0x00000000 0x00000000 0x00000000 0x00000000 3234 - 0x507d2524: 0x00000000 0x00000000 0x00000000 0x00000000 3235 - 0x507d2534: 0x00000000 0x00000000 0x507d25dc 0x00000000 3236 - 0x507d2544: 0x00000000 0x00000000 0x00000000 0x00000000 3237 - 0x507d2554: 0x00000000 0x00000000 0x00000000 0x00000000 3238 - 0x507d2564: 0x00000000 0x00000000 0x00000000 0x00000000 3239 - 0x507d2574: 0x00000000 0x00000000 0x00000000 0x00000000 3240 - 0x507d2584: 0x00000000 0x00000000 0x00000000 0x00000000 3241 - 0x507d2594: 0x00000000 0x00000000 0x00000000 0x00000000 3242 - 0x507d25a4: 0x00000000 0x00000000 0x00000000 0x00000000 3243 - 0x507d25b4: 0x00000000 0x00000000 0x00000000 0x00000000 3244 - 3245 - 3345 + (gdb) x/100x $25 3346 + 0x507d2434: 0x507d2434 0x00000000 0x08048000 0x080a4f8c 3347 + 0x507d2444: 0x00000000 0x080a79e0 0x080a8c94 0x080d1000 3348 + 0x507d2454: 0xaffffdb0 0xaffffe63 0xaffffe7a 0xaffffe7a 3349 + 0x507d2464: 0xafffffec 0x00000062 0x0000008a 0x00000000 3350 + 0x507d2474: 0x00000000 0x00000000 0x00000000 0x00000000 3351 + 0x507d2484: 0x00000000 0x00000000 0x00000000 0x00000000 3352 + 0x507d2494: 0x00000000 0x00000000 0x507d2fe0 0x00000000 3353 + 0x507d24a4: 0x00000000 0x00000000 0x00000000 0x00000000 3354 + 0x507d24b4: 0x00000000 0x00000000 0x00000000 0x00000000 3355 + 0x507d24c4: 0x00000000 0x00000000 0x00000000 0x00000000 3356 + 0x507d24d4: 0x00000000 0x00000000 0x00000000 0x00000000 3357 + 0x507d24e4: 0x00000000 0x00000000 0x00000000 0x00000000 3358 + 0x507d24f4: 0x00000000 0x00000000 0x00000000 0x00000000 3359 + 0x507d2504: 0x00000000 0x00000000 0x00000000 0x00000000 3360 + 0x507d2514: 0x00000000 0x00000000 0x00000000 0x00000000 3361 + 0x507d2524: 0x00000000 0x00000000 0x00000000 0x00000000 3362 + 0x507d2534: 0x00000000 0x00000000 0x507d25dc 0x00000000 3363 + 0x507d2544: 0x00000000 0x00000000 0x00000000 0x00000000 3364 + 0x507d2554: 0x00000000 0x00000000 0x00000000 0x00000000 3365 + 0x507d2564: 0x00000000 0x00000000 0x00000000 0x00000000 3366 + 0x507d2574: 0x00000000 0x00000000 0x00000000 0x00000000 3367 + 0x507d2584: 0x00000000 0x00000000 0x00000000 0x00000000 3368 + 0x507d2594: 0x00000000 0x00000000 0x00000000 0x00000000 3369 + 0x507d25a4: 0x00000000 0x00000000 0x00000000 0x00000000 3370 + 0x507d25b4: 0x00000000 0x00000000 0x00000000 0x00000000 3246 3371 3247 3372 3248 3373 ··· 3242 3399 on will be somewhat clearer. 3243 3400 3244 3401 3245 - 12.2. Episode 2: The case of the hung fsck 3402 + 12.2. Episode 2: The case of the hung fsck 3403 + ------------------------------------------- 3246 3404 3247 3405 After setting a trap in the SEGV handler for accesses to the signal 3248 3406 thread's stack, I reran the kernel. 3249 3407 3250 3408 3251 - fsck hung again, this time by hitting the trap: 3409 + fsck hung again, this time by hitting the trap:: 3252 3410 3253 3411 3254 3412 3413 + Setting hostname uml [ OK ] 3414 + Checking root filesystem 3415 + /dev/fhd0 contains a file system with errors, check forced. 3416 + Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. 3255 3417 3418 + /dev/fhd0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. 3419 + (i.e., without -a or -p options) 3420 + [ FAILED ] 3256 3421 3422 + *** An error occurred during the file system check. 3423 + *** Dropping you to a shell; the system will reboot 3424 + *** when you leave the shell. 3425 + Give root password for maintenance 3426 + (or type Control-D for normal startup): 3257 3427 3428 + [root@uml /root]# fsck -y /dev/fhd0 3429 + fsck -y /dev/fhd0 3430 + Parallelizing fsck version 1.14 (9-Jan-1999) 3431 + e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09 3432 + /dev/fhd0 contains a file system with errors, check forced. 3433 + Pass 1: Checking inodes, blocks, and sizes 3434 + Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. Ignore error? yes 3258 3435 3436 + Pass 2: Checking directory structure 3437 + Error reading block 49405 (Attempt to read block from filesystem resulted in short read). Ignore error? yes 3259 3438 3439 + Directory inode 11858, block 0, offset 0: directory corrupted 3440 + Salvage? yes 3260 3441 3442 + Missing '.' in directory inode 11858. 3443 + Fix? yes 3261 3444 3445 + Missing '..' in directory inode 11858. 3446 + Fix? yes 3262 3447 3263 - 3264 - 3265 - 3266 - 3267 - 3268 - Setting hostname uml [ OK ] 3269 - Checking root filesystem 3270 - /dev/fhd0 contains a file system with errors, check forced. 3271 - Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. 3272 - 3273 - /dev/fhd0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. 3274 - (i.e., without -a or -p options) 3275 - [ FAILED ] 3276 - 3277 - *** An error occurred during the file system check. 3278 - *** Dropping you to a shell; the system will reboot 3279 - *** when you leave the shell. 3280 - Give root password for maintenance 3281 - (or type Control-D for normal startup): 3282 - 3283 - [root@uml /root]# fsck -y /dev/fhd0 3284 - fsck -y /dev/fhd0 3285 - Parallelizing fsck version 1.14 (9-Jan-1999) 3286 - e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09 3287 - /dev/fhd0 contains a file system with errors, check forced. 3288 - Pass 1: Checking inodes, blocks, and sizes 3289 - Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. Ignore error? yes 3290 - 3291 - Pass 2: Checking directory structure 3292 - Error reading block 49405 (Attempt to read block from filesystem resulted in short read). Ignore error? yes 3293 - 3294 - Directory inode 11858, block 0, offset 0: directory corrupted 3295 - Salvage? yes 3296 - 3297 - Missing '.' in directory inode 11858. 3298 - Fix? yes 3299 - 3300 - Missing '..' in directory inode 11858. 3301 - Fix? yes 3302 - 3303 - Untested (4127) [100fe44c]: trap_kern.c line 31 3448 + Untested (4127) [100fe44c]: trap_kern.c line 31 3304 3449 3305 3450 3306 3451 ··· 3296 3465 3297 3466 I need to get the signal thread to detach from pid 4127 so that I can 3298 3467 attach to it with gdb. This is done by sending it a SIGUSR1, which is 3299 - caught by the signal thread, which detaches the process: 3468 + caught by the signal thread, which detaches the process:: 3300 3469 3301 3470 3302 3471 kill -USR1 4127 ··· 3305 3474 3306 3475 3307 3476 3308 - Now I can run gdb on it: 3477 + Now I can run gdb on it:: 3309 3478 3310 3479 3311 - 3312 - 3313 - 3314 - 3315 - 3316 - 3317 - 3318 - 3319 - 3320 - 3321 - 3322 - ~/linux/2.3.26/um 1034: gdb linux 3323 - GNU gdb 4.17.0.11 with Linux support 3324 - Copyright 1998 Free Software Foundation, Inc. 3325 - GDB is free software, covered by the GNU General Public License, and you are 3326 - welcome to change it and/or distribute copies of it under certain conditions. 3327 - Type "show copying" to see the conditions. 3328 - There is absolutely no warranty for GDB. Type "show warranty" for details. 3329 - This GDB was configured as "i386-redhat-linux"... 3330 - (gdb) att 4127 3331 - Attaching to program `/home/dike/linux/2.3.26/um/linux', Pid 4127 3332 - 0x10075891 in __libc_nanosleep () 3480 + ~/linux/2.3.26/um 1034: gdb linux 3481 + GNU gdb 4.17.0.11 with Linux support 3482 + Copyright 1998 Free Software Foundation, Inc. 3483 + GDB is free software, covered by the GNU General Public License, and you are 3484 + welcome to change it and/or distribute copies of it under certain conditions. 3485 + Type "show copying" to see the conditions. 3486 + There is absolutely no warranty for GDB. Type "show warranty" for details. 3487 + This GDB was configured as "i386-redhat-linux"... 3488 + (gdb) att 4127 3489 + Attaching to program `/home/dike/linux/2.3.26/um/linux', Pid 4127 3490 + 0x10075891 in __libc_nanosleep () 3333 3491 3334 3492 3335 3493 ··· 3326 3506 3327 3507 The backtrace shows that it was in a write and that the fault address 3328 3508 (address in frame 3) is 0x50000800, which is right in the middle of 3329 - the signal thread's stack page: 3509 + the signal thread's stack page:: 3330 3510 3331 3511 3332 3512 (gdb) bt ··· 3360 3540 3361 3541 3362 3542 3363 - 3364 - 3365 3543 Going up the stack to the segv_handler frame and looking at where in 3366 3544 the code the access happened shows that it happened near line 110 of 3367 - block_dev.c: 3545 + block_dev.c:: 3368 3546 3369 3547 3370 3548 3371 - 3372 - 3373 - 3374 - 3375 - 3376 - 3377 - (gdb) up 3378 - #1 0x1007584d in __sleep (seconds=1000000) 3379 - at ../sysdeps/unix/sysv/linux/sleep.c:78 3380 - ../sysdeps/unix/sysv/linux/sleep.c:78: No such file or directory. 3381 - (gdb) 3382 - #2 0x1006ce9a in stop () at user_util.c:191 3383 - 191 while(1) sleep(1000000); 3384 - (gdb) 3385 - #3 0x1006bf88 in segv (address=1342179328, is_write=2) at trap_kern.c:31 3386 - 31 KERN_UNTESTED(); 3387 - (gdb) 3388 - #4 0x1006c628 in segv_handler (sc=0x5006eaf8) at trap_user.c:174 3389 - 174 segv(sc->cr2, sc->err & 2); 3390 - (gdb) p *sc 3391 - $1 = {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, 3392 - __dsh = 0, edi = 1342179328, esi = 134973440, ebp = 1342631484, 3393 - esp = 1342630864, ebx = 256, edx = 0, ecx = 256, eax = 1024, trapno = 14, 3394 - err = 6, eip = 268550834, cs = 35, __csh = 0, eflags = 66070, 3395 - esp_at_signal = 1342630864, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 0, 3396 - cr2 = 1342179328} 3397 - (gdb) p (void *)268550834 3398 - $2 = (void *) 0x1001c2b2 3399 - (gdb) i sym $2 3400 - block_write + 1090 in section .text 3401 - (gdb) i line *$2 3402 - Line 209 of "/home/dike/linux/2.3.26/um/include/asm/arch/string.h" 3403 - starts at address 0x1001c2a1 <block_write+1073> 3404 - and ends at 0x1001c2bf <block_write+1103>. 3405 - (gdb) i line *0x1001c2c0 3406 - Line 110 of "block_dev.c" starts at address 0x1001c2bf <block_write+1103> 3407 - and ends at 0x1001c2e3 <block_write+1139>. 3408 - 3409 - 3549 + (gdb) up 3550 + #1 0x1007584d in __sleep (seconds=1000000) 3551 + at ../sysdeps/unix/sysv/linux/sleep.c:78 3552 + ../sysdeps/unix/sysv/linux/sleep.c:78: No such file or directory. 3553 + (gdb) 3554 + #2 0x1006ce9a in stop () at user_util.c:191 3555 + 191 while(1) sleep(1000000); 3556 + (gdb) 3557 + #3 0x1006bf88 in segv (address=1342179328, is_write=2) at trap_kern.c:31 3558 + 31 KERN_UNTESTED(); 3559 + (gdb) 3560 + #4 0x1006c628 in segv_handler (sc=0x5006eaf8) at trap_user.c:174 3561 + 174 segv(sc->cr2, sc->err & 2); 3562 + (gdb) p *sc 3563 + $1 = {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, 3564 + __dsh = 0, edi = 1342179328, esi = 134973440, ebp = 1342631484, 3565 + esp = 1342630864, ebx = 256, edx = 0, ecx = 256, eax = 1024, trapno = 14, 3566 + err = 6, eip = 268550834, cs = 35, __csh = 0, eflags = 66070, 3567 + esp_at_signal = 1342630864, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 0, 3568 + cr2 = 1342179328} 3569 + (gdb) p (void *)268550834 3570 + $2 = (void *) 0x1001c2b2 3571 + (gdb) i sym $2 3572 + block_write + 1090 in section .text 3573 + (gdb) i line *$2 3574 + Line 209 of "/home/dike/linux/2.3.26/um/include/asm/arch/string.h" 3575 + starts at address 0x1001c2a1 <block_write+1073> 3576 + and ends at 0x1001c2bf <block_write+1103>. 3577 + (gdb) i line *0x1001c2c0 3578 + Line 110 of "block_dev.c" starts at address 0x1001c2bf <block_write+1103> 3579 + and ends at 0x1001c2e3 <block_write+1139>. 3410 3580 3411 3581 3412 3582 3413 3583 Looking at the source shows that the fault happened during a call to 3414 - copy_from_user to copy the data into the kernel: 3584 + copy_from_user to copy the data into the kernel:: 3415 3585 3416 3586 3417 3587 107 count -= chars; ··· 3411 3601 3412 3602 3413 3603 3414 - 3415 - 3416 3604 p is the pointer which must contain 0x50000800, since buf contains 3417 - 0x80b8800 (frame 8 above). It is defined as: 3605 + 0x80b8800 (frame 8 above). It is defined as:: 3418 3606 3419 3607 3420 3608 p = offset + bh->b_data; ··· 3423 3615 3424 3616 I need to figure out what bh is, and it just so happens that bh is 3425 3617 passed as an argument to mark_buffer_uptodate and mark_buffer_dirty a 3426 - few lines later, so I do a little disassembly: 3618 + few lines later, so I do a little disassembly:: 3427 3619 3428 3620 3429 - 3430 - 3431 - (gdb) disas 0x1001c2bf 0x1001c2e0 3432 - Dump of assembler code from 0x1001c2bf to 0x1001c2d0: 3433 - 0x1001c2bf <block_write+1103>: addl %eax,0xc(%ebp) 3434 - 0x1001c2c2 <block_write+1106>: movl 0xfffffdd4(%ebp),%edx 3435 - 0x1001c2c8 <block_write+1112>: btsl $0x0,0x18(%edx) 3436 - 0x1001c2cd <block_write+1117>: btsl $0x1,0x18(%edx) 3437 - 0x1001c2d2 <block_write+1122>: sbbl %ecx,%ecx 3438 - 0x1001c2d4 <block_write+1124>: testl %ecx,%ecx 3439 - 0x1001c2d6 <block_write+1126>: jne 0x1001c2e3 <block_write+1139> 3440 - 0x1001c2d8 <block_write+1128>: pushl $0x0 3441 - 0x1001c2da <block_write+1130>: pushl %edx 3442 - 0x1001c2db <block_write+1131>: call 0x1001819c <__mark_buffer_dirty> 3443 - End of assembler dump. 3621 + (gdb) disas 0x1001c2bf 0x1001c2e0 3622 + Dump of assembler code from 0x1001c2bf to 0x1001c2d0: 3623 + 0x1001c2bf <block_write+1103>: addl %eax,0xc(%ebp) 3624 + 0x1001c2c2 <block_write+1106>: movl 0xfffffdd4(%ebp),%edx 3625 + 0x1001c2c8 <block_write+1112>: btsl $0x0,0x18(%edx) 3626 + 0x1001c2cd <block_write+1117>: btsl $0x1,0x18(%edx) 3627 + 0x1001c2d2 <block_write+1122>: sbbl %ecx,%ecx 3628 + 0x1001c2d4 <block_write+1124>: testl %ecx,%ecx 3629 + 0x1001c2d6 <block_write+1126>: jne 0x1001c2e3 <block_write+1139> 3630 + 0x1001c2d8 <block_write+1128>: pushl $0x0 3631 + 0x1001c2da <block_write+1130>: pushl %edx 3632 + 0x1001c2db <block_write+1131>: call 0x1001819c <__mark_buffer_dirty> 3633 + End of assembler dump. 3444 3634 3445 3635 3446 3636 ··· 3446 3640 3447 3641 At that point, bh is in %edx (address 0x1001c2da), which is calculated 3448 3642 at 0x1001c2c2 as %ebp + 0xfffffdd4, so I figure exactly what that is, 3449 - taking %ebp from the sigcontext_struct above: 3643 + taking %ebp from the sigcontext_struct above:: 3450 3644 3451 3645 3452 3646 (gdb) p (void *)1342631484 ··· 3463 3657 3464 3658 3465 3659 Now, I look at the structure to see what's in it, and particularly, 3466 - what its b_data field contains: 3660 + what its b_data field contains:: 3467 3661 3468 3662 3469 3663 (gdb) p *((struct buffer_head *)0x50100200) ··· 3488 3682 3489 3683 The b_page field is a pointer to the page_struct representing the 3490 3684 0x50000000 page. Looking at it shows the kernel's idea of the state 3491 - of that page: 3685 + of that page:: 3492 3686 3493 3687 3494 3688 3495 - (gdb) p *$13.b_page 3496 - $17 = {list = {next = 0x50004a5c, prev = 0x100c5174}, mapping = 0x0, 3497 - index = 0, next_hash = 0x0, count = {counter = 1}, flags = 132, lru = { 3498 - next = 0x50008460, prev = 0x50019350}, wait = { 3499 - lock = <optimized out or zero length>, task_list = {next = 0x50004024, 3500 - prev = 0x50004024}, __magic = 1342193708, __creator = 0}, 3501 - pprev_hash = 0x0, buffers = 0x501002c0, virtual = 1342177280, 3502 - zone = 0x100c5160} 3689 + (gdb) p *$13.b_page 3690 + $17 = {list = {next = 0x50004a5c, prev = 0x100c5174}, mapping = 0x0, 3691 + index = 0, next_hash = 0x0, count = {counter = 1}, flags = 132, lru = { 3692 + next = 0x50008460, prev = 0x50019350}, wait = { 3693 + lock = <optimized out or zero length>, task_list = {next = 0x50004024, 3694 + prev = 0x50004024}, __magic = 1342193708, __creator = 0}, 3695 + pprev_hash = 0x0, buffers = 0x501002c0, virtual = 1342177280, 3696 + zone = 0x100c5160} 3503 3697 3504 3698 3505 3699 ··· 3508 3702 Some sanity-checking: the virtual field shows the "virtual" address of 3509 3703 this page, which in this kernel is the same as its "physical" address, 3510 3704 and the page_struct itself should be mem_map[0], since it represents 3511 - the first page of memory: 3705 + the first page of memory:: 3512 3706 3513 3707 3514 3708 ··· 3525 3719 3526 3720 3527 3721 Now to check out the page_struct itself. In particular, the flags 3528 - field shows whether the page is considered free or not: 3722 + field shows whether the page is considered free or not:: 3529 3723 3530 3724 3531 3725 (gdb) p (void *)132 ··· 3545 3739 3546 3740 3547 3741 In my setup_arch procedure, I have the following code which looks just 3548 - fine: 3742 + fine:: 3549 3743 3550 3744 3551 3745 ··· 3568 3762 3569 3763 3570 3764 Stepping into init_bootmem, and looking at bootmem_map before looking 3571 - at what it contains shows the following: 3765 + at what it contains shows the following:: 3572 3766 3573 3767 3574 3768 ··· 3594 3788 3595 3789 3596 3790 3597 - 13. What to do when UML doesn't work 3791 + 13. What to do when UML doesn't work 3792 + ===================================== 3598 3793 3599 3794 3600 3795 3601 3796 3602 - 13.1. Strange compilation errors when you build from source 3797 + 13.1. Strange compilation errors when you build from source 3798 + ------------------------------------------------------------ 3603 3799 3604 3800 As of test11, it is necessary to have "ARCH=um" in the environment or 3605 3801 on the make command line for all steps in building UML, including 3606 3802 clean, distclean, or mrproper, config, menuconfig, or xconfig, dep, 3607 3803 and linux. If you forget for any of them, the i386 build seems to 3608 - contaminate the UML build. If this happens, start from scratch with 3804 + contaminate the UML build. If this happens, start from scratch with:: 3609 3805 3610 3806 3611 3807 host% ··· 3619 3811 and repeat the build process with ARCH=um on all the steps. 3620 3812 3621 3813 3622 - See ``Compiling the kernel and modules'' for more details. 3814 + See :ref:`Compiling_the_kernel_and_modules` for more details. 3623 3815 3624 3816 3625 3817 Another cause of strange compilation errors is building UML in ··· 3632 3824 3633 3825 3634 3826 3635 - 13.3. A variety of panics and hangs with /tmp on a reiserfs filesys- 3636 - tem 3827 + 13.3. A variety of panics and hangs with /tmp on a reiserfs filesystem 3828 + ----------------------------------------------------------------------- 3637 3829 3638 3830 I saw this on reiserfs 3.5.21 and it seems to be fixed in 3.5.27. 3639 - Panics preceded by 3831 + Panics preceded by:: 3640 3832 3641 3833 3642 3834 Detaching pid nnnn ··· 3662 3854 3663 3855 3664 3856 3665 - 13.5. UML doesn't work when /tmp is an NFS filesystem 3857 + 13.5. UML doesn't work when /tmp is an NFS filesystem 3858 + ------------------------------------------------------ 3666 3859 3667 3860 This seems to be a similar situation with the ReiserFS problem above. 3668 3861 Some versions of NFS seems not to handle mmap correctly, which UML 3669 3862 depends on. The workaround is have /tmp be a non-NFS directory. 3670 3863 3671 3864 3672 - 13.6. UML hangs on boot when compiled with gprof support 3865 + 13.6. UML hangs on boot when compiled with gprof support 3866 + --------------------------------------------------------- 3673 3867 3674 3868 If you build UML with gprof support and, early in the boot, it does 3675 - this 3869 + this:: 3676 3870 3677 3871 3678 3872 kernel BUG at page_alloc.c:100! ··· 3688 3878 3689 3879 3690 3880 3691 - 13.7. syslogd dies with a SIGTERM on startup 3881 + 13.7. syslogd dies with a SIGTERM on startup 3882 + --------------------------------------------- 3692 3883 3693 3884 The exact boot error depends on the distribution that you're booting, 3694 - but Debian produces this: 3885 + but Debian produces this:: 3695 3886 3696 3887 3697 3888 /etc/rc2.d/S10sysklogd: line 49: 93 Terminated ··· 3702 3891 3703 3892 3704 3893 This is a syslogd bug. There's a race between a parent process 3705 - installing a signal handler and its child sending the signal. See 3706 - this uml-devel post <http://www.geocrawler.com/lists/3/Source- 3707 - Forge/709/0/6612801> for the details. 3894 + installing a signal handler and its child sending the signal. 3708 3895 3709 3896 3710 3897 3711 - 13.8. TUN/TAP networking doesn't work on a 2.4 host 3898 + 13.8. TUN/TAP networking doesn't work on a 2.4 host 3899 + ---------------------------------------------------- 3712 3900 3713 - There are a couple of problems which were 3714 - <http://www.geocrawler.com/lists/3/SourceForge/597/0/> name="pointed 3715 - out"> by Tim Robinson <timro at trkr dot net> 3901 + There are a couple of problems which were reported by 3902 + Tim Robinson <timro at trkr dot net> 3716 3903 3717 - o It doesn't work on hosts running 2.4.7 (or thereabouts) or earlier. 3904 + - It doesn't work on hosts running 2.4.7 (or thereabouts) or earlier. 3718 3905 The fix is to upgrade to something more recent and then read the 3719 3906 next item. 3720 3907 3721 - o If you see 3908 + - If you see:: 3722 3909 3723 3910 3724 3911 File descriptor in bad state ··· 3730 3921 3731 3922 3732 3923 3733 - 13.9. You can network to the host but not to other machines on the 3734 - net 3924 + 13.9. You can network to the host but not to other machines on the net 3925 + ======================================================================= 3735 3926 3736 3927 If you can connect to the host, and the host can connect to UML, but 3737 3928 you cannot connect to any other machines, then you may need to enable ··· 3739 3930 using private IP addresses (192.168.x.x or 10.x.x.x) for host/UML 3740 3931 networking, rather than the public address space that your host is 3741 3932 connected to. UML does not enable IP Masquerading, so you will need 3742 - to create a static rule to enable it: 3933 + to create a static rule to enable it:: 3743 3934 3744 3935 3745 3936 host% ··· 3753 3944 3754 3945 3755 3946 Documentation on IP Masquerading, and SNAT, can be found at 3756 - www.netfilter.org <http://www.netfilter.org> . 3947 + http://www.netfilter.org. 3757 3948 3758 3949 3759 3950 If you can reach the local net, but not the outside Internet, then 3760 - that is usually a routing problem. The UML needs a default route: 3951 + that is usually a routing problem. The UML needs a default route:: 3761 3952 3762 3953 3763 3954 UML# ··· 3781 3972 3782 3973 3783 3974 3784 - 13.10. I have no root and I want to scream 3975 + 13.10. I have no root and I want to scream 3976 + =========================================== 3785 3977 3786 3978 Thanks to Birgit Wahlich for telling me about this strange one. It 3787 3979 turns out that there's a limit of six environment variables on the ··· 3797 3987 3798 3988 3799 3989 3800 - 13.11. UML build conflict between ptrace.h and ucontext.h 3990 + 13.11. UML build conflict between ptrace.h and ucontext.h 3991 + ========================================================== 3801 3992 3802 3993 On some older systems, /usr/include/asm/ptrace.h and 3803 3994 /usr/include/sys/ucontext.h define the same names. So, when they're 3804 3995 included together, the defines from one completely mess up the parsing 3805 - of the other, producing errors like: 3996 + of the other, producing errors like:: 3997 + 3806 3998 /usr/include/sys/ucontext.h:47: parse error before 3807 - `10' 3999 + `10` 3808 4000 3809 4001 3810 4002 ··· 3819 4007 3820 4008 3821 4009 3822 - 13.12. The UML BogoMips is exactly half the host's BogoMips 4010 + 13.12. The UML BogoMips is exactly half the host's BogoMips 4011 + ------------------------------------------------------------ 3823 4012 3824 4013 On i386 kernels, there are two ways of running the loop that is used 3825 4014 to calculate the BogoMips rating, using the TSC if it's there or using ··· 3832 4019 3833 4020 3834 4021 3835 - 13.13. When you run UML, it immediately segfaults 4022 + 13.13. When you run UML, it immediately segfaults 4023 + -------------------------------------------------- 3836 4024 3837 4025 If the host is configured with the 2G/2G address space split, that's 3838 - why. See ``UML on 2G/2G hosts'' for the details on getting UML to 4026 + why. See ref:`UML_on_2G/2G_hosts` for the details on getting UML to 3839 4027 run on your host. 3840 4028 3841 4029 3842 4030 3843 - 13.14. xterms appear, then immediately disappear 4031 + 13.14. xterms appear, then immediately disappear 4032 + ------------------------------------------------- 3844 4033 3845 4034 If you're running an up to date kernel with an old release of 3846 4035 uml_utilities, the port-helper program will not work properly, so ··· 3854 4039 3855 4040 3856 4041 3857 - 13.15. Any other panic, hang, or strange behavior 4042 + 13.15. Any other panic, hang, or strange behavior 4043 + -------------------------------------------------- 3858 4044 3859 4045 If you're seeing truly strange behavior, such as hangs or panics that 3860 4046 happen in random places, or you try running the debugger to see what's ··· 3873 4057 it and that a fix is imminent. 3874 4058 3875 4059 3876 - If you want to be super-helpful, read ``Diagnosing Problems'' and 4060 + If you want to be super-helpful, read :ref:`Diagnosing_Problems` and 3877 4061 follow the instructions contained therein. 3878 - 14. Diagnosing Problems 4062 + 4063 + .. _Diagnosing_Problems: 4064 + 4065 + 14. Diagnosing Problems 4066 + ======================== 3879 4067 3880 4068 3881 4069 If you get UML to crash, hang, or otherwise misbehave, you should ··· 3894 4074 3895 4075 For any diagnosis, you're going to need to build a debugging kernel. 3896 4076 The binaries from this site aren't debuggable. If you haven't done 3897 - this before, read about ``Compiling the kernel and modules'' and 3898 - ``Kernel debugging'' UML first. 4077 + this before, read about :ref:`Compiling_the_kernel_and_modules` and 4078 + :ref:`Kernel_debugging` UML first. 3899 4079 3900 4080 3901 - 14.1. Case 1 : Normal kernel panics 4081 + 14.1. Case 1 : Normal kernel panics 4082 + ------------------------------------ 3902 4083 3903 4084 The most common case is for a normal thread to panic. To debug this, 3904 4085 you will need to run it under the debugger (add 'debug' to the command 3905 4086 line). An xterm will start up with gdb running inside it. Continue 3906 - it when it stops in start_kernel and make it crash. Now ^C gdb and 4087 + it when it stops in start_kernel and make it crash. Now ``^C gdb`` and 3907 4088 3908 4089 3909 4090 If the panic was a "Kernel mode fault", then there will be a segv 3910 4091 frame on the stack and I'm going to want some more information. The 3911 - stack might look something like this: 4092 + stack might look something like this:: 3912 4093 3913 4094 3914 4095 (UML gdb) backtrace ··· 3928 4107 3929 4108 3930 4109 I'm going to want to see the symbol and line information for the value 3931 - of ip in the segv frame. In this case, you would do the following: 4110 + of ip in the segv frame. In this case, you would do the following:: 3932 4111 3933 4112 3934 4113 (UML gdb) i sym 268849158 ··· 3936 4115 3937 4116 3938 4117 3939 - and 4118 + and:: 3940 4119 3941 4120 3942 4121 (UML gdb) i line *268849158 ··· 3949 4128 to get that information from the faulting ip. 3950 4129 3951 4130 3952 - 14.2. Case 2 : Tracing thread panics 4131 + 14.2. Case 2 : Tracing thread panics 4132 + ------------------------------------- 3953 4133 3954 4134 The less common and more painful case is when the tracing thread 3955 4135 panics. In this case, the kernel debugger will be useless because it ··· 3958 4136 do is get a backtrace from the tracing thread. This is done by 3959 4137 figuring out what its pid is, firing up gdb, and attaching it to that 3960 4138 pid. You can figure out the tracing thread pid by looking at the 3961 - first line of the console output, which will look like this: 4139 + first line of the console output, which will look like this:: 3962 4140 3963 4141 3964 4142 tracing thread pid = 15851 ··· 3967 4145 3968 4146 3969 4147 or by running ps on the host and finding the line that looks like 3970 - this: 4148 + this:: 3971 4149 3972 4150 3973 4151 jdike 15851 4.5 0.4 132568 1104 pts/0 S 21:34 0:05 ./linux [(tracing thread)] ··· 3986 4164 14.3. Case 3 : Tracing thread panics caused by other threads 3987 4165 3988 4166 However, there are cases where the misbehavior of another thread 3989 - caused the problem. The most common panic of this type is: 4167 + caused the problem. The most common panic of this type is:: 3990 4168 3991 4169 3992 4170 wait_for_stop failed to wait for <pid> to stop with <signal number> ··· 3999 4177 debugger is defunct and without some fancy footwork, another gdb can't 4000 4178 attach to it. So, this is how the fancy footwork goes: 4001 4179 4002 - In a shell: 4180 + In a shell:: 4003 4181 4004 4182 4005 4183 host% kill -STOP pid ··· 4007 4185 4008 4186 4009 4187 4010 - Run gdb on the tracing thread as described in case 2 and do: 4188 + Run gdb on the tracing thread as described in case 2 and do:: 4011 4189 4012 4190 4013 4191 (host gdb) call detach(pid) ··· 4015 4193 4016 4194 If you get a segfault, do it again. It always works the second time. 4017 4195 4018 - Detach from the tracing thread and attach to that other thread: 4196 + Detach from the tracing thread and attach to that other thread:: 4019 4197 4020 4198 4021 4199 (host gdb) detach ··· 4031 4209 4032 4210 4033 4211 If gdb hangs when attaching to that process, go back to a shell and 4034 - do: 4212 + do:: 4035 4213 4036 4214 4037 4215 host% ··· 4040 4218 4041 4219 4042 4220 4043 - And then get the backtrace: 4221 + And then get the backtrace:: 4044 4222 4045 4223 4046 4224 (host gdb) backtrace ··· 4049 4227 4050 4228 4051 4229 4052 - 14.4. Case 4 : Hangs 4230 + 14.4. Case 4 : Hangs 4231 + --------------------- 4053 4232 4054 4233 Hangs seem to be fairly rare, but they sometimes happen. When a hang 4055 4234 happens, we need a backtrace from the offending process. Run the 4056 4235 kernel debugger as described in case 1 and get a backtrace. If the 4057 4236 current process is not the idle thread, then send in the backtrace. 4058 - You can tell that it's the idle thread if the stack looks like this: 4237 + You can tell that it's the idle thread if the stack looks like this:: 4059 4238 4060 4239 4061 4240 #0 0x100b1401 in __libc_nanosleep () ··· 4080 4257 4081 4258 4082 4259 4083 - 15. Thanks 4260 + 15. Thanks 4261 + =========== 4084 4262 4085 4263 4086 4264 A number of people have helped this project in various ways, and this ··· 4098 4274 bookkeeping lapses and I forget about contributions. 4099 4275 4100 4276 4101 - 15.1. Code and Documentation 4277 + 15.1. Code and Documentation 4278 + ----------------------------- 4102 4279 4103 4280 Rusty Russell <rusty at linuxcare.com.au> - 4104 4281 4105 - o wrote the HOWTO <http://user-mode- 4106 - linux.sourceforge.net/UserModeLinux-HOWTO.html> 4282 + - wrote the HOWTO 4283 + http://user-mode-linux.sourceforge.net/old/UserModeLinux-HOWTO.html 4107 4284 4108 - o prodded me into making this project official and putting it on 4285 + - prodded me into making this project official and putting it on 4109 4286 SourceForge 4110 4287 4111 - o came up with the way cool UML logo <http://user-mode- 4112 - linux.sourceforge.net/uml-small.png> 4288 + - came up with the way cool UML logo 4289 + http://user-mode-linux.sourceforge.net/uml-small.png 4113 4290 4114 - o redid the config process 4291 + - redid the config process 4115 4292 4116 4293 4117 4294 Peter Moulder <reiter at netspace.net.au> - Fixed my config and build ··· 4121 4296 4122 4297 Bill Stearns <wstearns at pobox.com> - 4123 4298 4124 - o HOWTO updates 4299 + - HOWTO updates 4125 4300 4126 - o lots of bug reports 4301 + - lots of bug reports 4127 4302 4128 - o lots of testing 4303 + - lots of testing 4129 4304 4130 - o dedicated a box (uml.ists.dartmouth.edu) to support UML development 4305 + - dedicated a box (uml.ists.dartmouth.edu) to support UML development 4131 4306 4132 - o wrote the mkrootfs script, which allows bootable filesystems of 4307 + - wrote the mkrootfs script, which allows bootable filesystems of 4133 4308 RPM-based distributions to be cranked out 4134 4309 4135 - o cranked out a large number of filesystems with said script 4310 + - cranked out a large number of filesystems with said script 4136 4311 4137 4312 4138 4313 Jim Leu <jleu at mindspring.com> - Wrote the virtual ethernet driver 4139 4314 and associated usermode tools 4140 4315 4141 - Lars Brinkhoff <http://lars.nocrew.org/> - Contributed the ptrace 4142 - proxy from his own project <http://a386.nocrew.org/> to allow easier 4143 - kernel debugging 4316 + Lars Brinkhoff http://lars.nocrew.org/ - Contributed the ptrace 4317 + proxy from his own project to allow easier kernel debugging 4144 4318 4145 4319 4146 4320 Andrea Arcangeli <andrea at suse.de> - Redid some of the early boot 4147 4321 code so that it would work on machines with Large File Support 4148 4322 4149 4323 4150 - Chris Emerson <http://www.chiark.greenend.org.uk/~cemerson/> - Did 4151 - the first UML port to Linux/ppc 4324 + Chris Emerson - Did the first UML port to Linux/ppc 4152 4325 4153 4326 4154 4327 Harald Welte <laforge at gnumonks.org> - Wrote the multicast ··· 4161 4338 wrote the iomem emulation support 4162 4339 4163 4340 4164 - Henrik Nordstrom <http://hem.passagen.se/hno/> - Provided a variety 4341 + Henrik Nordstrom http://hem.passagen.se/hno/ - Provided a variety 4165 4342 of patches, fixes, and clues 4166 4343 4167 4344 ··· 4196 4373 submitted patches for the slip transport and lots of other things. 4197 4374 4198 4375 4199 - David Coulson <http://davidcoulson.net> - 4376 + David Coulson http://davidcoulson.net - 4200 4377 4201 - o Set up the usermodelinux.org <http://usermodelinux.org> site, 4378 + - Set up the http://usermodelinux.org site, 4202 4379 which is a great way of keeping the UML user community on top of 4203 4380 UML goings-on. 4204 4381 4205 - o Site documentation and updates 4382 + - Site documentation and updates 4206 4383 4207 - o Nifty little UML management daemon UMLd 4208 - <http://uml.openconsultancy.com/umld/> 4384 + - Nifty little UML management daemon UMLd 4209 4385 4210 - o Lots of testing and bug reports 4386 + - Lots of testing and bug reports 4211 4387 4212 4388 4213 4389 4214 4390 4215 - 15.2. Flushing out bugs 4391 + 15.2. Flushing out bugs 4392 + ------------------------ 4216 4393 4217 4394 4218 4395 4219 - o Yuri Pudgorodsky 4396 + - Yuri Pudgorodsky 4220 4397 4221 - o Gerald Britton 4398 + - Gerald Britton 4222 4399 4223 - o Ian Wehrman 4400 + - Ian Wehrman 4224 4401 4225 - o Gord Lamb 4402 + - Gord Lamb 4226 4403 4227 - o Eugene Koontz 4404 + - Eugene Koontz 4228 4405 4229 - o John H. Hartman 4406 + - John H. Hartman 4230 4407 4231 - o Anders Karlsson 4408 + - Anders Karlsson 4232 4409 4233 - o Daniel Phillips 4410 + - Daniel Phillips 4234 4411 4235 - o John Fremlin 4412 + - John Fremlin 4236 4413 4237 - o Rainer Burgstaller 4414 + - Rainer Burgstaller 4238 4415 4239 - o James Stevenson 4416 + - James Stevenson 4240 4417 4241 - o Matt Clay 4418 + - Matt Clay 4242 4419 4243 - o Cliff Jefferies 4420 + - Cliff Jefferies 4244 4421 4245 - o Geoff Hoff 4422 + - Geoff Hoff 4246 4423 4247 - o Lennert Buytenhek 4424 + - Lennert Buytenhek 4248 4425 4249 - o Al Viro 4426 + - Al Viro 4250 4427 4251 - o Frank Klingenhoefer 4428 + - Frank Klingenhoefer 4252 4429 4253 - o Livio Baldini Soares 4430 + - Livio Baldini Soares 4254 4431 4255 - o Jon Burgess 4432 + - Jon Burgess 4256 4433 4257 - o Petru Paler 4434 + - Petru Paler 4258 4435 4259 - o Paul 4436 + - Paul 4260 4437 4261 - o Chris Reahard 4438 + - Chris Reahard 4262 4439 4263 - o Sverker Nilsson 4440 + - Sverker Nilsson 4264 4441 4265 - o Gong Su 4442 + - Gong Su 4266 4443 4267 - o johan verrept 4444 + - johan verrept 4268 4445 4269 - o Bjorn Eriksson 4446 + - Bjorn Eriksson 4270 4447 4271 - o Lorenzo Allegrucci 4448 + - Lorenzo Allegrucci 4272 4449 4273 - o Muli Ben-Yehuda 4450 + - Muli Ben-Yehuda 4274 4451 4275 - o David Mansfield 4452 + - David Mansfield 4276 4453 4277 - o Howard Goff 4454 + - Howard Goff 4278 4455 4279 - o Mike Anderson 4456 + - Mike Anderson 4280 4457 4281 - o John Byrne 4458 + - John Byrne 4282 4459 4283 - o Sapan J. Batia 4460 + - Sapan J. Batia 4284 4461 4285 - o Iris Huang 4462 + - Iris Huang 4286 4463 4287 - o Jan Hudec 4464 + - Jan Hudec 4288 4465 4289 - o Voluspa 4466 + - Voluspa 4290 4467 4291 4468 4292 4469 4293 4470 4294 - 15.3. Buglets and clean-ups 4471 + 15.3. Buglets and clean-ups 4472 + ---------------------------- 4295 4473 4296 4474 4297 4475 4298 - o Dave Zarzycki 4476 + - Dave Zarzycki 4299 4477 4300 - o Adam Lazur 4478 + - Adam Lazur 4301 4479 4302 - o Boria Feigin 4480 + - Boria Feigin 4303 4481 4304 - o Brian J. Murrell 4482 + - Brian J. Murrell 4305 4483 4306 - o JS 4484 + - JS 4307 4485 4308 - o Roman Zippel 4486 + - Roman Zippel 4309 4487 4310 - o Wil Cooley 4488 + - Wil Cooley 4311 4489 4312 - o Ayelet Shemesh 4490 + - Ayelet Shemesh 4313 4491 4314 - o Will Dyson 4492 + - Will Dyson 4315 4493 4316 - o Sverker Nilsson 4494 + - Sverker Nilsson 4317 4495 4318 - o dvorak 4496 + - dvorak 4319 4497 4320 - o v.naga srinivas 4498 + - v.naga srinivas 4321 4499 4322 - o Shlomi Fish 4500 + - Shlomi Fish 4323 4501 4324 - o Roger Binns 4502 + - Roger Binns 4325 4503 4326 - o johan verrept 4504 + - johan verrept 4327 4505 4328 - o MrChuoi 4506 + - MrChuoi 4329 4507 4330 - o Peter Cleve 4508 + - Peter Cleve 4331 4509 4332 - o Vincent Guffens 4510 + - Vincent Guffens 4333 4511 4334 - o Nathan Scott 4512 + - Nathan Scott 4335 4513 4336 - o Patrick Caulfield 4514 + - Patrick Caulfield 4337 4515 4338 - o jbearce 4516 + - jbearce 4339 4517 4340 - o Catalin Marinas 4518 + - Catalin Marinas 4341 4519 4342 - o Shane Spencer 4520 + - Shane Spencer 4343 4521 4344 - o Zou Min 4522 + - Zou Min 4345 4523 4346 4524 4347 - o Ryan Boder 4525 + - Ryan Boder 4348 4526 4349 - o Lorenzo Colitti 4527 + - Lorenzo Colitti 4350 4528 4351 - o Gwendal Grignou 4529 + - Gwendal Grignou 4352 4530 4353 - o Andre' Breiler 4531 + - Andre' Breiler 4354 4532 4355 - o Tsutomu Yasuda 4533 + - Tsutomu Yasuda 4356 4534 4357 4535 4358 4536 4359 - 15.4. Case Studies 4537 + 15.4. Case Studies 4538 + ------------------- 4360 4539 4361 4540 4362 - o Jon Wright 4541 + - Jon Wright 4363 4542 4364 - o William McEwan 4543 + - William McEwan 4365 4544 4366 - o Michael Richardson 4545 + - Michael Richardson 4367 4546 4368 4547 4369 4548 4370 - 15.5. Other contributions 4549 + 15.5. Other contributions 4550 + -------------------------- 4371 4551 4372 4552 4373 4553 Bill Carr <Bill.Carr at compaq.com> made the Red Hat mkrootfs script 4374 4554 work with RH 6.2. 4375 4555 4376 4556 Michael Jennings <mikejen at hevanet.com> sent in some material which 4377 - is now gracing the top of the index page <http://user-mode- 4378 - linux.sourceforge.net/> of this site. 4557 + is now gracing the top of the index page 4558 + http://user-mode-linux.sourceforge.net/ of this site. 4379 4559 4380 - SGI <http://www.sgi.com> (and more specifically Ralf Baechle <ralf at 4381 - uni-koblenz.de> ) gave me an account on oss.sgi.com 4382 - <http://www.oss.sgi.com> . The bandwidth there made it possible to 4560 + SGI (and more specifically Ralf Baechle <ralf at 4561 + uni-koblenz.de> ) gave me an account on oss.sgi.com. 4562 + The bandwidth there made it possible to 4383 4563 produce most of the filesystems available on the project download 4384 4564 page. 4385 4565 ··· 4399 4573 4400 4574 Chris Reahard built a specialized root filesystem for running a DNS 4401 4575 server jailed inside UML. It's available from the download 4402 - <http://user-mode-linux.sourceforge.net/dl-sf.html> page in the Jail 4576 + http://user-mode-linux.sourceforge.net/old/dl-sf.html page in the Jail 4403 4577 Filesystems section. 4404 - 4405 - 4406 - 4407 - 4408 - 4409 - 4410 - 4411 - 4412 - 4413 - 4414 - 4415 -

+9 -3

Documentation/virtual/guest-halt-polling.txt Documentation/virt/guest-halt-polling.rst

··· 1 + ================== 1 2 Guest halt polling 2 3 ================== 3 4 4 5 The cpuidle_haltpoll driver, with the haltpoll governor, allows 5 6 the guest vcpus to poll for a specified amount of time before 6 7 halting. 8 + 7 9 This provides the following benefits to host side polling: 8 10 9 11 1) The POLL flag is set while polling is performed, which allows ··· 31 29 The haltpoll governor has 5 tunable module parameters: 32 30 33 31 1) guest_halt_poll_ns: 32 + 34 33 Maximum amount of time, in nanoseconds, that polling is 35 34 performed before halting. 36 35 37 36 Default: 200000 38 37 39 38 2) guest_halt_poll_shrink: 39 + 40 40 Division factor used to shrink per-cpu guest_halt_poll_ns when 41 41 wakeup event occurs after the global guest_halt_poll_ns. 42 42 43 43 Default: 2 44 44 45 45 3) guest_halt_poll_grow: 46 + 46 47 Multiplication factor used to grow per-cpu guest_halt_poll_ns 47 48 when event occurs after per-cpu guest_halt_poll_ns 48 49 but before global guest_halt_poll_ns. ··· 53 48 Default: 2 54 49 55 50 4) guest_halt_poll_grow_start: 51 + 56 52 The per-cpu guest_halt_poll_ns eventually reaches zero 57 53 in case of an idle system. This value sets the initial 58 54 per-cpu guest_halt_poll_ns when growing. This can ··· 72 66 73 67 Default: Y 74 68 75 - The module parameters can be set from the debugfs files in: 69 + The module parameters can be set from the debugfs files in:: 76 70 77 71 /sys/module/haltpoll/parameters/ 78 72 ··· 80 74 ============= 81 75 82 76 - Care should be taken when setting the guest_halt_poll_ns parameter as a 83 - large value has the potential to drive the cpu usage to 100% on a machine which 84 - would be almost entirely idle otherwise. 77 + large value has the potential to drive the cpu usage to 100% on a machine 78 + which would be almost entirely idle otherwise.

+13 -3

arch/x86/include/asm/kvm_host.h

··· 781 781 u64 msr_kvm_poll_control; 782 782 783 783 /* 784 - * Indicate whether the access faults on its page table in guest 785 - * which is set when fix page fault and used to detect unhandeable 786 - * instruction. 784 + * Indicates the guest is trying to write a gfn that contains one or 785 + * more of the PTEs used to translate the write itself, i.e. the access 786 + * is changing its own translation in the guest page tables. KVM exits 787 + * to userspace if emulation of the faulting instruction fails and this 788 + * flag is set, as KVM cannot make forward progress. 789 + * 790 + * If emulation fails for a write to guest page tables, KVM unprotects 791 + * (zaps) the shadow page for the target gfn and resumes the guest to 792 + * retry the non-emulatable instruction (on hardware). Unprotecting the 793 + * gfn doesn't allow forward progress for a self-changing access because 794 + * doing so also zaps the translation for the gfn, i.e. retrying the 795 + * instruction will hit a !PRESENT fault, which results in a new shadow 796 + * page and sends KVM back to square one. 787 797 */ 788 798 bool write_fault_to_shadow_pgtable; 789 799

-3

arch/x86/kvm/lapic.c

··· 1080 1080 result = 1; 1081 1081 /* assumes that there are only KVM_APIC_INIT/SIPI */ 1082 1082 apic->pending_events = (1UL << KVM_APIC_INIT); 1083 - /* make sure pending_events is visible before sending 1084 - * the request */ 1085 - smp_wmb(); 1086 1083 kvm_make_request(KVM_REQ_EVENT, vcpu); 1087 1084 kvm_vcpu_kick(vcpu); 1088 1085 }

+13

arch/x86/kvm/mmu.h

··· 102 102 kvm_get_active_pcid(vcpu)); 103 103 } 104 104 105 + int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, 106 + bool prefault); 107 + 108 + static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, 109 + u32 err, bool prefault) 110 + { 111 + #ifdef CONFIG_RETPOLINE 112 + if (likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault)) 113 + return kvm_tdp_page_fault(vcpu, cr2_or_gpa, err, prefault); 114 + #endif 115 + return vcpu->arch.mmu->page_fault(vcpu, cr2_or_gpa, err, prefault); 116 + } 117 + 105 118 /* 106 119 * Currently, we have two sorts of write-protection, a) the first one 107 120 * write-protects guest page to sync the guest modification, b) another one is

+5 -6

arch/x86/kvm/mmu/mmu.c

··· 4219 4219 } 4220 4220 EXPORT_SYMBOL_GPL(kvm_handle_page_fault); 4221 4221 4222 - static int tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, 4223 - bool prefault) 4222 + int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, 4223 + bool prefault) 4224 4224 { 4225 4225 int max_level; 4226 4226 ··· 4925 4925 return; 4926 4926 4927 4927 context->mmu_role.as_u64 = new_role.as_u64; 4928 - context->page_fault = tdp_page_fault; 4928 + context->page_fault = kvm_tdp_page_fault; 4929 4929 context->sync_page = nonpaging_sync_page; 4930 4930 context->invlpg = nonpaging_invlpg; 4931 4931 context->update_pte = nonpaging_update_pte; ··· 5436 5436 } 5437 5437 5438 5438 if (r == RET_PF_INVALID) { 5439 - r = vcpu->arch.mmu->page_fault(vcpu, cr2_or_gpa, 5440 - lower_32_bits(error_code), 5441 - false); 5439 + r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, 5440 + lower_32_bits(error_code), false); 5442 5441 WARN_ON(r == RET_PF_INVALID); 5443 5442 } 5444 5443

+1 -1

arch/x86/kvm/mmu/paging_tmpl.h

··· 33 33 #define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT 34 34 #define PT_HAVE_ACCESSED_DIRTY(mmu) true 35 35 #ifdef CONFIG_X86_64 36 - #define PT_MAX_FULL_LEVELS 4 36 + #define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL 37 37 #define CMPXCHG cmpxchg 38 38 #else 39 39 #define CMPXCHG cmpxchg64

+1 -1

arch/x86/kvm/svm.c

··· 2175 2175 u32 dummy; 2176 2176 u32 eax = 1; 2177 2177 2178 - vcpu->arch.microcode_version = 0x01000065; 2179 2178 svm->spec_ctrl = 0; 2180 2179 svm->virt_spec_ctrl = 0; 2181 2180 ··· 2265 2266 init_vmcb(svm); 2266 2267 2267 2268 svm_init_osvw(vcpu); 2269 + vcpu->arch.microcode_version = 0x01000065; 2268 2270 2269 2271 return 0; 2270 2272

+31 -2

arch/x86/kvm/vmx/nested.c

··· 544 544 } 545 545 } 546 546 547 - static inline void enable_x2apic_msr_intercepts(unsigned long *msr_bitmap) { 547 + static inline void enable_x2apic_msr_intercepts(unsigned long *msr_bitmap) 548 + { 548 549 int msr; 549 550 550 551 for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { ··· 1982 1981 } 1983 1982 1984 1983 /* 1985 - * Clean fields data can't de used on VMLAUNCH and when we switch 1984 + * Clean fields data can't be used on VMLAUNCH and when we switch 1986 1985 * between different L2 guests as KVM keeps a single VMCS12 per L1. 1987 1986 */ 1988 1987 if (from_launch || evmcs_gpa_changed) ··· 3576 3575 nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI, intr_info, exit_qual); 3577 3576 } 3578 3577 3578 + /* 3579 + * Returns true if a debug trap is pending delivery. 3580 + * 3581 + * In KVM, debug traps bear an exception payload. As such, the class of a #DB 3582 + * exception may be inferred from the presence of an exception payload. 3583 + */ 3584 + static inline bool vmx_pending_dbg_trap(struct kvm_vcpu *vcpu) 3585 + { 3586 + return vcpu->arch.exception.pending && 3587 + vcpu->arch.exception.nr == DB_VECTOR && 3588 + vcpu->arch.exception.payload; 3589 + } 3590 + 3591 + /* 3592 + * Certain VM-exits set the 'pending debug exceptions' field to indicate a 3593 + * recognized #DB (data or single-step) that has yet to be delivered. Since KVM 3594 + * represents these debug traps with a payload that is said to be compatible 3595 + * with the 'pending debug exceptions' field, write the payload to the VMCS 3596 + * field if a VM-exit is delivered before the debug trap. 3597 + */ 3598 + static void nested_vmx_update_pending_dbg(struct kvm_vcpu *vcpu) 3599 + { 3600 + if (vmx_pending_dbg_trap(vcpu)) 3601 + vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 3602 + vcpu->arch.exception.payload); 3603 + } 3604 + 3579 3605 static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) 3580 3606 { 3581 3607 struct vcpu_vmx *vmx = to_vmx(vcpu); ··· 3615 3587 test_bit(KVM_APIC_INIT, &apic->pending_events)) { 3616 3588 if (block_nested_events) 3617 3589 return -EBUSY; 3590 + nested_vmx_update_pending_dbg(vcpu); 3618 3591 clear_bit(KVM_APIC_INIT, &apic->pending_events); 3619 3592 nested_vmx_vmexit(vcpu, EXIT_REASON_INIT_SIGNAL, 0, 0); 3620 3593 return 0;

+4 -1

arch/x86/kvm/vmx/vmx.c

··· 2947 2947 2948 2948 static int get_ept_level(struct kvm_vcpu *vcpu) 2949 2949 { 2950 + /* Nested EPT currently only supports 4-level walks. */ 2951 + if (is_guest_mode(vcpu) && nested_cpu_has_ept(get_vmcs12(vcpu))) 2952 + return 4; 2950 2953 if (cpu_has_vmx_ept_5levels() && (cpuid_maxphyaddr(vcpu) > 48)) 2951 2954 return 5; 2952 2955 return 4; ··· 4241 4238 4242 4239 vmx->msr_ia32_umwait_control = 0; 4243 4240 4244 - vcpu->arch.microcode_version = 0x100000000ULL; 4245 4241 vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val(); 4246 4242 vmx->hv_deadline_tsc = -1; 4247 4243 kvm_set_cr8(vcpu, 0); ··· 6765 6763 vmx->nested.posted_intr_nv = -1; 6766 6764 vmx->nested.current_vmptr = -1ull; 6767 6765 6766 + vcpu->arch.microcode_version = 0x100000000ULL; 6768 6767 vmx->msr_ia32_feature_control_valid_bits = FEAT_CTL_LOCKED; 6769 6768 6770 6769 /*

+26 -16

arch/x86/kvm/x86.c

··· 438 438 * for #DB exceptions under VMX. 439 439 */ 440 440 vcpu->arch.dr6 ^= payload & DR6_RTM; 441 + 442 + /* 443 + * The #DB payload is defined as compatible with the 'pending 444 + * debug exceptions' field under VMX, not DR6. While bit 12 is 445 + * defined in the 'pending debug exceptions' field (enabled 446 + * breakpoint), it is reserved and must be zero in DR6. 447 + */ 448 + vcpu->arch.dr6 &= ~BIT(12); 441 449 break; 442 450 case PF_VECTOR: 443 451 vcpu->arch.cr2 = payload; ··· 498 490 vcpu->arch.exception.error_code = error_code; 499 491 vcpu->arch.exception.has_payload = has_payload; 500 492 vcpu->arch.exception.payload = payload; 501 - /* 502 - * In guest mode, payload delivery should be deferred, 503 - * so that the L1 hypervisor can intercept #PF before 504 - * CR2 is modified (or intercept #DB before DR6 is 505 - * modified under nVMX). However, for ABI 506 - * compatibility with KVM_GET_VCPU_EVENTS and 507 - * KVM_SET_VCPU_EVENTS, we can't delay payload 508 - * delivery unless userspace has enabled this 509 - * functionality via the per-VM capability, 510 - * KVM_CAP_EXCEPTION_PAYLOAD. 511 - */ 512 - if (!vcpu->kvm->arch.exception_payload_enabled || 513 - !is_guest_mode(vcpu)) 493 + if (!is_guest_mode(vcpu)) 514 494 kvm_deliver_exception_payload(vcpu); 515 495 return; 516 496 } ··· 2444 2448 vcpu->hv_clock.tsc_timestamp = tsc_timestamp; 2445 2449 vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset; 2446 2450 vcpu->last_guest_tsc = tsc_timestamp; 2447 - WARN_ON(vcpu->hv_clock.system_time < 0); 2451 + WARN_ON((s64)vcpu->hv_clock.system_time < 0); 2448 2452 2449 2453 /* If the host uses TSC clocksource, then it is stable */ 2450 2454 pvclock_flags = 0; ··· 3790 3794 struct kvm_vcpu_events *events) 3791 3795 { 3792 3796 process_nmi(vcpu); 3797 + 3798 + /* 3799 + * In guest mode, payload delivery should be deferred, 3800 + * so that the L1 hypervisor can intercept #PF before 3801 + * CR2 is modified (or intercept #DB before DR6 is 3802 + * modified under nVMX). Unless the per-VM capability, 3803 + * KVM_CAP_EXCEPTION_PAYLOAD, is set, we may not defer the delivery of 3804 + * an exception payload and handle after a KVM_GET_VCPU_EVENTS. Since we 3805 + * opportunistically defer the exception payload, deliver it if the 3806 + * capability hasn't been requested before processing a 3807 + * KVM_GET_VCPU_EVENTS. 3808 + */ 3809 + if (!vcpu->kvm->arch.exception_payload_enabled && 3810 + vcpu->arch.exception.pending && vcpu->arch.exception.has_payload) 3811 + kvm_deliver_exception_payload(vcpu); 3793 3812 3794 3813 /* 3795 3814 * The API doesn't provide the instruction length for software ··· 8953 8942 8954 8943 kvm_rip_write(vcpu, ctxt->eip); 8955 8944 kvm_set_rflags(vcpu, ctxt->eflags); 8956 - kvm_make_request(KVM_REQ_EVENT, vcpu); 8957 8945 return 1; 8958 8946 } 8959 8947 EXPORT_SYMBOL_GPL(kvm_task_switch); ··· 10192 10182 work->arch.cr3 != vcpu->arch.mmu->get_cr3(vcpu)) 10193 10183 return; 10194 10184 10195 - vcpu->arch.mmu->page_fault(vcpu, work->cr2_or_gpa, 0, true); 10185 + kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true); 10196 10186 } 10197 10187 10198 10188 static inline u32 kvm_async_pf_hash_fn(gfn_t gfn)

+2 -1

tools/testing/selftests/kvm/Makefile

··· 8 8 UNAME_M := $(shell uname -m) 9 9 10 10 LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/sparsebit.c 11 - LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/ucall.c 11 + LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c 12 12 LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c 13 13 LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c 14 14 ··· 26 26 TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test 27 27 TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test 28 28 TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test 29 + TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test 29 30 TEST_GEN_PROGS_x86_64 += clear_dirty_log_test 30 31 TEST_GEN_PROGS_x86_64 += dirty_log_test 31 32 TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus

+22 -22

tools/testing/selftests/kvm/include/x86_64/processor.h

··· 36 36 #define X86_CR4_SMAP (1ul << 21) 37 37 #define X86_CR4_PKE (1ul << 22) 38 38 39 - /* The enum values match the intruction encoding of each register */ 40 - enum x86_register { 41 - RAX = 0, 42 - RCX, 43 - RDX, 44 - RBX, 45 - RSP, 46 - RBP, 47 - RSI, 48 - RDI, 49 - R8, 50 - R9, 51 - R10, 52 - R11, 53 - R12, 54 - R13, 55 - R14, 56 - R15, 39 + /* General Registers in 64-Bit Mode */ 40 + struct gpr64_regs { 41 + u64 rax; 42 + u64 rcx; 43 + u64 rdx; 44 + u64 rbx; 45 + u64 rsp; 46 + u64 rbp; 47 + u64 rsi; 48 + u64 rdi; 49 + u64 r8; 50 + u64 r9; 51 + u64 r10; 52 + u64 r11; 53 + u64 r12; 54 + u64 r13; 55 + u64 r14; 56 + u64 r15; 57 57 }; 58 58 59 59 struct desc64 { ··· 220 220 __asm__ __volatile__("mov %0, %%cr4" : : "r" (val) : "memory"); 221 221 } 222 222 223 - static inline uint64_t get_gdt_base(void) 223 + static inline struct desc_ptr get_gdt(void) 224 224 { 225 225 struct desc_ptr gdt; 226 226 __asm__ __volatile__("sgdt %[gdt]" 227 227 : /* output */ [gdt]"=m"(gdt)); 228 - return gdt.address; 228 + return gdt; 229 229 } 230 230 231 - static inline uint64_t get_idt_base(void) 231 + static inline struct desc_ptr get_idt(void) 232 232 { 233 233 struct desc_ptr idt; 234 234 __asm__ __volatile__("sidt %[idt]" 235 235 : /* output */ [idt]"=m"(idt)); 236 - return idt.address; 236 + return idt; 237 237 } 238 238 239 239 #define SET_XMM(__var, __xmm) \

+297

tools/testing/selftests/kvm/include/x86_64/svm.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * tools/testing/selftests/kvm/include/x86_64/svm.h 4 + * This is a copy of arch/x86/include/asm/svm.h 5 + * 6 + */ 7 + 8 + #ifndef SELFTEST_KVM_SVM_H 9 + #define SELFTEST_KVM_SVM_H 10 + 11 + enum { 12 + INTERCEPT_INTR, 13 + INTERCEPT_NMI, 14 + INTERCEPT_SMI, 15 + INTERCEPT_INIT, 16 + INTERCEPT_VINTR, 17 + INTERCEPT_SELECTIVE_CR0, 18 + INTERCEPT_STORE_IDTR, 19 + INTERCEPT_STORE_GDTR, 20 + INTERCEPT_STORE_LDTR, 21 + INTERCEPT_STORE_TR, 22 + INTERCEPT_LOAD_IDTR, 23 + INTERCEPT_LOAD_GDTR, 24 + INTERCEPT_LOAD_LDTR, 25 + INTERCEPT_LOAD_TR, 26 + INTERCEPT_RDTSC, 27 + INTERCEPT_RDPMC, 28 + INTERCEPT_PUSHF, 29 + INTERCEPT_POPF, 30 + INTERCEPT_CPUID, 31 + INTERCEPT_RSM, 32 + INTERCEPT_IRET, 33 + INTERCEPT_INTn, 34 + INTERCEPT_INVD, 35 + INTERCEPT_PAUSE, 36 + INTERCEPT_HLT, 37 + INTERCEPT_INVLPG, 38 + INTERCEPT_INVLPGA, 39 + INTERCEPT_IOIO_PROT, 40 + INTERCEPT_MSR_PROT, 41 + INTERCEPT_TASK_SWITCH, 42 + INTERCEPT_FERR_FREEZE, 43 + INTERCEPT_SHUTDOWN, 44 + INTERCEPT_VMRUN, 45 + INTERCEPT_VMMCALL, 46 + INTERCEPT_VMLOAD, 47 + INTERCEPT_VMSAVE, 48 + INTERCEPT_STGI, 49 + INTERCEPT_CLGI, 50 + INTERCEPT_SKINIT, 51 + INTERCEPT_RDTSCP, 52 + INTERCEPT_ICEBP, 53 + INTERCEPT_WBINVD, 54 + INTERCEPT_MONITOR, 55 + INTERCEPT_MWAIT, 56 + INTERCEPT_MWAIT_COND, 57 + INTERCEPT_XSETBV, 58 + INTERCEPT_RDPRU, 59 + }; 60 + 61 + 62 + struct __attribute__ ((__packed__)) vmcb_control_area { 63 + u32 intercept_cr; 64 + u32 intercept_dr; 65 + u32 intercept_exceptions; 66 + u64 intercept; 67 + u8 reserved_1[40]; 68 + u16 pause_filter_thresh; 69 + u16 pause_filter_count; 70 + u64 iopm_base_pa; 71 + u64 msrpm_base_pa; 72 + u64 tsc_offset; 73 + u32 asid; 74 + u8 tlb_ctl; 75 + u8 reserved_2[3]; 76 + u32 int_ctl; 77 + u32 int_vector; 78 + u32 int_state; 79 + u8 reserved_3[4]; 80 + u32 exit_code; 81 + u32 exit_code_hi; 82 + u64 exit_info_1; 83 + u64 exit_info_2; 84 + u32 exit_int_info; 85 + u32 exit_int_info_err; 86 + u64 nested_ctl; 87 + u64 avic_vapic_bar; 88 + u8 reserved_4[8]; 89 + u32 event_inj; 90 + u32 event_inj_err; 91 + u64 nested_cr3; 92 + u64 virt_ext; 93 + u32 clean; 94 + u32 reserved_5; 95 + u64 next_rip; 96 + u8 insn_len; 97 + u8 insn_bytes[15]; 98 + u64 avic_backing_page; /* Offset 0xe0 */ 99 + u8 reserved_6[8]; /* Offset 0xe8 */ 100 + u64 avic_logical_id; /* Offset 0xf0 */ 101 + u64 avic_physical_id; /* Offset 0xf8 */ 102 + u8 reserved_7[768]; 103 + }; 104 + 105 + 106 + #define TLB_CONTROL_DO_NOTHING 0 107 + #define TLB_CONTROL_FLUSH_ALL_ASID 1 108 + #define TLB_CONTROL_FLUSH_ASID 3 109 + #define TLB_CONTROL_FLUSH_ASID_LOCAL 7 110 + 111 + #define V_TPR_MASK 0x0f 112 + 113 + #define V_IRQ_SHIFT 8 114 + #define V_IRQ_MASK (1 << V_IRQ_SHIFT) 115 + 116 + #define V_GIF_SHIFT 9 117 + #define V_GIF_MASK (1 << V_GIF_SHIFT) 118 + 119 + #define V_INTR_PRIO_SHIFT 16 120 + #define V_INTR_PRIO_MASK (0x0f << V_INTR_PRIO_SHIFT) 121 + 122 + #define V_IGN_TPR_SHIFT 20 123 + #define V_IGN_TPR_MASK (1 << V_IGN_TPR_SHIFT) 124 + 125 + #define V_INTR_MASKING_SHIFT 24 126 + #define V_INTR_MASKING_MASK (1 << V_INTR_MASKING_SHIFT) 127 + 128 + #define V_GIF_ENABLE_SHIFT 25 129 + #define V_GIF_ENABLE_MASK (1 << V_GIF_ENABLE_SHIFT) 130 + 131 + #define AVIC_ENABLE_SHIFT 31 132 + #define AVIC_ENABLE_MASK (1 << AVIC_ENABLE_SHIFT) 133 + 134 + #define LBR_CTL_ENABLE_MASK BIT_ULL(0) 135 + #define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1) 136 + 137 + #define SVM_INTERRUPT_SHADOW_MASK 1 138 + 139 + #define SVM_IOIO_STR_SHIFT 2 140 + #define SVM_IOIO_REP_SHIFT 3 141 + #define SVM_IOIO_SIZE_SHIFT 4 142 + #define SVM_IOIO_ASIZE_SHIFT 7 143 + 144 + #define SVM_IOIO_TYPE_MASK 1 145 + #define SVM_IOIO_STR_MASK (1 << SVM_IOIO_STR_SHIFT) 146 + #define SVM_IOIO_REP_MASK (1 << SVM_IOIO_REP_SHIFT) 147 + #define SVM_IOIO_SIZE_MASK (7 << SVM_IOIO_SIZE_SHIFT) 148 + #define SVM_IOIO_ASIZE_MASK (7 << SVM_IOIO_ASIZE_SHIFT) 149 + 150 + #define SVM_VM_CR_VALID_MASK 0x001fULL 151 + #define SVM_VM_CR_SVM_LOCK_MASK 0x0008ULL 152 + #define SVM_VM_CR_SVM_DIS_MASK 0x0010ULL 153 + 154 + #define SVM_NESTED_CTL_NP_ENABLE BIT(0) 155 + #define SVM_NESTED_CTL_SEV_ENABLE BIT(1) 156 + 157 + struct __attribute__ ((__packed__)) vmcb_seg { 158 + u16 selector; 159 + u16 attrib; 160 + u32 limit; 161 + u64 base; 162 + }; 163 + 164 + struct __attribute__ ((__packed__)) vmcb_save_area { 165 + struct vmcb_seg es; 166 + struct vmcb_seg cs; 167 + struct vmcb_seg ss; 168 + struct vmcb_seg ds; 169 + struct vmcb_seg fs; 170 + struct vmcb_seg gs; 171 + struct vmcb_seg gdtr; 172 + struct vmcb_seg ldtr; 173 + struct vmcb_seg idtr; 174 + struct vmcb_seg tr; 175 + u8 reserved_1[43]; 176 + u8 cpl; 177 + u8 reserved_2[4]; 178 + u64 efer; 179 + u8 reserved_3[112]; 180 + u64 cr4; 181 + u64 cr3; 182 + u64 cr0; 183 + u64 dr7; 184 + u64 dr6; 185 + u64 rflags; 186 + u64 rip; 187 + u8 reserved_4[88]; 188 + u64 rsp; 189 + u8 reserved_5[24]; 190 + u64 rax; 191 + u64 star; 192 + u64 lstar; 193 + u64 cstar; 194 + u64 sfmask; 195 + u64 kernel_gs_base; 196 + u64 sysenter_cs; 197 + u64 sysenter_esp; 198 + u64 sysenter_eip; 199 + u64 cr2; 200 + u8 reserved_6[32]; 201 + u64 g_pat; 202 + u64 dbgctl; 203 + u64 br_from; 204 + u64 br_to; 205 + u64 last_excp_from; 206 + u64 last_excp_to; 207 + }; 208 + 209 + struct __attribute__ ((__packed__)) vmcb { 210 + struct vmcb_control_area control; 211 + struct vmcb_save_area save; 212 + }; 213 + 214 + #define SVM_CPUID_FUNC 0x8000000a 215 + 216 + #define SVM_VM_CR_SVM_DISABLE 4 217 + 218 + #define SVM_SELECTOR_S_SHIFT 4 219 + #define SVM_SELECTOR_DPL_SHIFT 5 220 + #define SVM_SELECTOR_P_SHIFT 7 221 + #define SVM_SELECTOR_AVL_SHIFT 8 222 + #define SVM_SELECTOR_L_SHIFT 9 223 + #define SVM_SELECTOR_DB_SHIFT 10 224 + #define SVM_SELECTOR_G_SHIFT 11 225 + 226 + #define SVM_SELECTOR_TYPE_MASK (0xf) 227 + #define SVM_SELECTOR_S_MASK (1 << SVM_SELECTOR_S_SHIFT) 228 + #define SVM_SELECTOR_DPL_MASK (3 << SVM_SELECTOR_DPL_SHIFT) 229 + #define SVM_SELECTOR_P_MASK (1 << SVM_SELECTOR_P_SHIFT) 230 + #define SVM_SELECTOR_AVL_MASK (1 << SVM_SELECTOR_AVL_SHIFT) 231 + #define SVM_SELECTOR_L_MASK (1 << SVM_SELECTOR_L_SHIFT) 232 + #define SVM_SELECTOR_DB_MASK (1 << SVM_SELECTOR_DB_SHIFT) 233 + #define SVM_SELECTOR_G_MASK (1 << SVM_SELECTOR_G_SHIFT) 234 + 235 + #define SVM_SELECTOR_WRITE_MASK (1 << 1) 236 + #define SVM_SELECTOR_READ_MASK SVM_SELECTOR_WRITE_MASK 237 + #define SVM_SELECTOR_CODE_MASK (1 << 3) 238 + 239 + #define INTERCEPT_CR0_READ 0 240 + #define INTERCEPT_CR3_READ 3 241 + #define INTERCEPT_CR4_READ 4 242 + #define INTERCEPT_CR8_READ 8 243 + #define INTERCEPT_CR0_WRITE (16 + 0) 244 + #define INTERCEPT_CR3_WRITE (16 + 3) 245 + #define INTERCEPT_CR4_WRITE (16 + 4) 246 + #define INTERCEPT_CR8_WRITE (16 + 8) 247 + 248 + #define INTERCEPT_DR0_READ 0 249 + #define INTERCEPT_DR1_READ 1 250 + #define INTERCEPT_DR2_READ 2 251 + #define INTERCEPT_DR3_READ 3 252 + #define INTERCEPT_DR4_READ 4 253 + #define INTERCEPT_DR5_READ 5 254 + #define INTERCEPT_DR6_READ 6 255 + #define INTERCEPT_DR7_READ 7 256 + #define INTERCEPT_DR0_WRITE (16 + 0) 257 + #define INTERCEPT_DR1_WRITE (16 + 1) 258 + #define INTERCEPT_DR2_WRITE (16 + 2) 259 + #define INTERCEPT_DR3_WRITE (16 + 3) 260 + #define INTERCEPT_DR4_WRITE (16 + 4) 261 + #define INTERCEPT_DR5_WRITE (16 + 5) 262 + #define INTERCEPT_DR6_WRITE (16 + 6) 263 + #define INTERCEPT_DR7_WRITE (16 + 7) 264 + 265 + #define SVM_EVTINJ_VEC_MASK 0xff 266 + 267 + #define SVM_EVTINJ_TYPE_SHIFT 8 268 + #define SVM_EVTINJ_TYPE_MASK (7 << SVM_EVTINJ_TYPE_SHIFT) 269 + 270 + #define SVM_EVTINJ_TYPE_INTR (0 << SVM_EVTINJ_TYPE_SHIFT) 271 + #define SVM_EVTINJ_TYPE_NMI (2 << SVM_EVTINJ_TYPE_SHIFT) 272 + #define SVM_EVTINJ_TYPE_EXEPT (3 << SVM_EVTINJ_TYPE_SHIFT) 273 + #define SVM_EVTINJ_TYPE_SOFT (4 << SVM_EVTINJ_TYPE_SHIFT) 274 + 275 + #define SVM_EVTINJ_VALID (1 << 31) 276 + #define SVM_EVTINJ_VALID_ERR (1 << 11) 277 + 278 + #define SVM_EXITINTINFO_VEC_MASK SVM_EVTINJ_VEC_MASK 279 + #define SVM_EXITINTINFO_TYPE_MASK SVM_EVTINJ_TYPE_MASK 280 + 281 + #define SVM_EXITINTINFO_TYPE_INTR SVM_EVTINJ_TYPE_INTR 282 + #define SVM_EXITINTINFO_TYPE_NMI SVM_EVTINJ_TYPE_NMI 283 + #define SVM_EXITINTINFO_TYPE_EXEPT SVM_EVTINJ_TYPE_EXEPT 284 + #define SVM_EXITINTINFO_TYPE_SOFT SVM_EVTINJ_TYPE_SOFT 285 + 286 + #define SVM_EXITINTINFO_VALID SVM_EVTINJ_VALID 287 + #define SVM_EXITINTINFO_VALID_ERR SVM_EVTINJ_VALID_ERR 288 + 289 + #define SVM_EXITINFOSHIFT_TS_REASON_IRET 36 290 + #define SVM_EXITINFOSHIFT_TS_REASON_JMP 38 291 + #define SVM_EXITINFOSHIFT_TS_HAS_ERROR_CODE 44 292 + 293 + #define SVM_EXITINFO_REG_MASK 0x0F 294 + 295 + #define SVM_CR0_SELECTIVE_MASK (X86_CR0_TS | X86_CR0_MP) 296 + 297 + #endif /* SELFTEST_KVM_SVM_H */

+38

tools/testing/selftests/kvm/include/x86_64/svm_util.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * tools/testing/selftests/kvm/include/x86_64/svm_utils.h 4 + * Header for nested SVM testing 5 + * 6 + * Copyright (C) 2020, Red Hat, Inc. 7 + */ 8 + 9 + #ifndef SELFTEST_KVM_SVM_UTILS_H 10 + #define SELFTEST_KVM_SVM_UTILS_H 11 + 12 + #include <stdint.h> 13 + #include "svm.h" 14 + #include "processor.h" 15 + 16 + #define CPUID_SVM_BIT 2 17 + #define CPUID_SVM BIT_ULL(CPUID_SVM_BIT) 18 + 19 + #define SVM_EXIT_VMMCALL 0x081 20 + 21 + struct svm_test_data { 22 + /* VMCB */ 23 + struct vmcb *vmcb; /* gva */ 24 + void *vmcb_hva; 25 + uint64_t vmcb_gpa; 26 + 27 + /* host state-save area */ 28 + struct vmcb_save_area *save_area; /* gva */ 29 + void *save_area_hva; 30 + uint64_t save_area_gpa; 31 + }; 32 + 33 + struct svm_test_data *vcpu_alloc_svm(struct kvm_vm *vm, vm_vaddr_t *p_svm_gva); 34 + void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp); 35 + void run_guest(struct vmcb *vmcb, uint64_t vmcb_gpa); 36 + void nested_svm_check_supported(void); 37 + 38 + #endif /* SELFTEST_KVM_SVM_UTILS_H */

+161

tools/testing/selftests/kvm/lib/x86_64/svm.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * tools/testing/selftests/kvm/lib/x86_64/svm.c 4 + * Helpers used for nested SVM testing 5 + * Largely inspired from KVM unit test svm.c 6 + * 7 + * Copyright (C) 2020, Red Hat, Inc. 8 + */ 9 + 10 + #include "test_util.h" 11 + #include "kvm_util.h" 12 + #include "../kvm_util_internal.h" 13 + #include "processor.h" 14 + #include "svm_util.h" 15 + 16 + struct gpr64_regs guest_regs; 17 + u64 rflags; 18 + 19 + /* Allocate memory regions for nested SVM tests. 20 + * 21 + * Input Args: 22 + * vm - The VM to allocate guest-virtual addresses in. 23 + * 24 + * Output Args: 25 + * p_svm_gva - The guest virtual address for the struct svm_test_data. 26 + * 27 + * Return: 28 + * Pointer to structure with the addresses of the SVM areas. 29 + */ 30 + struct svm_test_data * 31 + vcpu_alloc_svm(struct kvm_vm *vm, vm_vaddr_t *p_svm_gva) 32 + { 33 + vm_vaddr_t svm_gva = vm_vaddr_alloc(vm, getpagesize(), 34 + 0x10000, 0, 0); 35 + struct svm_test_data *svm = addr_gva2hva(vm, svm_gva); 36 + 37 + svm->vmcb = (void *)vm_vaddr_alloc(vm, getpagesize(), 38 + 0x10000, 0, 0); 39 + svm->vmcb_hva = addr_gva2hva(vm, (uintptr_t)svm->vmcb); 40 + svm->vmcb_gpa = addr_gva2gpa(vm, (uintptr_t)svm->vmcb); 41 + 42 + svm->save_area = (void *)vm_vaddr_alloc(vm, getpagesize(), 43 + 0x10000, 0, 0); 44 + svm->save_area_hva = addr_gva2hva(vm, (uintptr_t)svm->save_area); 45 + svm->save_area_gpa = addr_gva2gpa(vm, (uintptr_t)svm->save_area); 46 + 47 + *p_svm_gva = svm_gva; 48 + return svm; 49 + } 50 + 51 + static void vmcb_set_seg(struct vmcb_seg *seg, u16 selector, 52 + u64 base, u32 limit, u32 attr) 53 + { 54 + seg->selector = selector; 55 + seg->attrib = attr; 56 + seg->limit = limit; 57 + seg->base = base; 58 + } 59 + 60 + void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp) 61 + { 62 + struct vmcb *vmcb = svm->vmcb; 63 + uint64_t vmcb_gpa = svm->vmcb_gpa; 64 + struct vmcb_save_area *save = &vmcb->save; 65 + struct vmcb_control_area *ctrl = &vmcb->control; 66 + u32 data_seg_attr = 3 | SVM_SELECTOR_S_MASK | SVM_SELECTOR_P_MASK 67 + | SVM_SELECTOR_DB_MASK | SVM_SELECTOR_G_MASK; 68 + u32 code_seg_attr = 9 | SVM_SELECTOR_S_MASK | SVM_SELECTOR_P_MASK 69 + | SVM_SELECTOR_L_MASK | SVM_SELECTOR_G_MASK; 70 + uint64_t efer; 71 + 72 + efer = rdmsr(MSR_EFER); 73 + wrmsr(MSR_EFER, efer | EFER_SVME); 74 + wrmsr(MSR_VM_HSAVE_PA, svm->save_area_gpa); 75 + 76 + memset(vmcb, 0, sizeof(*vmcb)); 77 + asm volatile ("vmsave\n\t" : : "a" (vmcb_gpa) : "memory"); 78 + vmcb_set_seg(&save->es, get_es(), 0, -1U, data_seg_attr); 79 + vmcb_set_seg(&save->cs, get_cs(), 0, -1U, code_seg_attr); 80 + vmcb_set_seg(&save->ss, get_ss(), 0, -1U, data_seg_attr); 81 + vmcb_set_seg(&save->ds, get_ds(), 0, -1U, data_seg_attr); 82 + vmcb_set_seg(&save->gdtr, 0, get_gdt().address, get_gdt().size, 0); 83 + vmcb_set_seg(&save->idtr, 0, get_idt().address, get_idt().size, 0); 84 + 85 + ctrl->asid = 1; 86 + save->cpl = 0; 87 + save->efer = rdmsr(MSR_EFER); 88 + asm volatile ("mov %%cr4, %0" : "=r"(save->cr4) : : "memory"); 89 + asm volatile ("mov %%cr3, %0" : "=r"(save->cr3) : : "memory"); 90 + asm volatile ("mov %%cr0, %0" : "=r"(save->cr0) : : "memory"); 91 + asm volatile ("mov %%dr7, %0" : "=r"(save->dr7) : : "memory"); 92 + asm volatile ("mov %%dr6, %0" : "=r"(save->dr6) : : "memory"); 93 + asm volatile ("mov %%cr2, %0" : "=r"(save->cr2) : : "memory"); 94 + save->g_pat = rdmsr(MSR_IA32_CR_PAT); 95 + save->dbgctl = rdmsr(MSR_IA32_DEBUGCTLMSR); 96 + ctrl->intercept = (1ULL << INTERCEPT_VMRUN) | 97 + (1ULL << INTERCEPT_VMMCALL); 98 + 99 + vmcb->save.rip = (u64)guest_rip; 100 + vmcb->save.rsp = (u64)guest_rsp; 101 + guest_regs.rdi = (u64)svm; 102 + } 103 + 104 + /* 105 + * save/restore 64-bit general registers except rax, rip, rsp 106 + * which are directly handed through the VMCB guest processor state 107 + */ 108 + #define SAVE_GPR_C \ 109 + "xchg %%rbx, guest_regs+0x20\n\t" \ 110 + "xchg %%rcx, guest_regs+0x10\n\t" \ 111 + "xchg %%rdx, guest_regs+0x18\n\t" \ 112 + "xchg %%rbp, guest_regs+0x30\n\t" \ 113 + "xchg %%rsi, guest_regs+0x38\n\t" \ 114 + "xchg %%rdi, guest_regs+0x40\n\t" \ 115 + "xchg %%r8, guest_regs+0x48\n\t" \ 116 + "xchg %%r9, guest_regs+0x50\n\t" \ 117 + "xchg %%r10, guest_regs+0x58\n\t" \ 118 + "xchg %%r11, guest_regs+0x60\n\t" \ 119 + "xchg %%r12, guest_regs+0x68\n\t" \ 120 + "xchg %%r13, guest_regs+0x70\n\t" \ 121 + "xchg %%r14, guest_regs+0x78\n\t" \ 122 + "xchg %%r15, guest_regs+0x80\n\t" 123 + 124 + #define LOAD_GPR_C SAVE_GPR_C 125 + 126 + /* 127 + * selftests do not use interrupts so we dropped clgi/sti/cli/stgi 128 + * for now. registers involved in LOAD/SAVE_GPR_C are eventually 129 + * unmodified so they do not need to be in the clobber list. 130 + */ 131 + void run_guest(struct vmcb *vmcb, uint64_t vmcb_gpa) 132 + { 133 + asm volatile ( 134 + "vmload\n\t" 135 + "mov rflags, %%r15\n\t" // rflags 136 + "mov %%r15, 0x170(%[vmcb])\n\t" 137 + "mov guest_regs, %%r15\n\t" // rax 138 + "mov %%r15, 0x1f8(%[vmcb])\n\t" 139 + LOAD_GPR_C 140 + "vmrun\n\t" 141 + SAVE_GPR_C 142 + "mov 0x170(%[vmcb]), %%r15\n\t" // rflags 143 + "mov %%r15, rflags\n\t" 144 + "mov 0x1f8(%[vmcb]), %%r15\n\t" // rax 145 + "mov %%r15, guest_regs\n\t" 146 + "vmsave\n\t" 147 + : : [vmcb] "r" (vmcb), [vmcb_gpa] "a" (vmcb_gpa) 148 + : "r15", "memory"); 149 + } 150 + 151 + void nested_svm_check_supported(void) 152 + { 153 + struct kvm_cpuid_entry2 *entry = 154 + kvm_get_supported_cpuid_entry(0x80000001); 155 + 156 + if (!(entry->ecx & CPUID_SVM)) { 157 + fprintf(stderr, "nested SVM not enabled, skipping test\n"); 158 + exit(KSFT_SKIP); 159 + } 160 + } 161 +

+3 -3

tools/testing/selftests/kvm/lib/x86_64/vmx.c

··· 288 288 vmwrite(HOST_FS_BASE, rdmsr(MSR_FS_BASE)); 289 289 vmwrite(HOST_GS_BASE, rdmsr(MSR_GS_BASE)); 290 290 vmwrite(HOST_TR_BASE, 291 - get_desc64_base((struct desc64 *)(get_gdt_base() + get_tr()))); 292 - vmwrite(HOST_GDTR_BASE, get_gdt_base()); 293 - vmwrite(HOST_IDTR_BASE, get_idt_base()); 291 + get_desc64_base((struct desc64 *)(get_gdt().address + get_tr()))); 292 + vmwrite(HOST_GDTR_BASE, get_gdt().address); 293 + vmwrite(HOST_IDTR_BASE, get_idt().address); 294 294 vmwrite(HOST_IA32_SYSENTER_ESP, rdmsr(MSR_IA32_SYSENTER_ESP)); 295 295 vmwrite(HOST_IA32_SYSENTER_EIP, rdmsr(MSR_IA32_SYSENTER_EIP)); 296 296 }

+79

tools/testing/selftests/kvm/x86_64/svm_vmcall_test.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * svm_vmcall_test 4 + * 5 + * Copyright (C) 2020, Red Hat, Inc. 6 + * 7 + * Nested SVM testing: VMCALL 8 + */ 9 + 10 + #include "test_util.h" 11 + #include "kvm_util.h" 12 + #include "processor.h" 13 + #include "svm_util.h" 14 + 15 + #define VCPU_ID 5 16 + 17 + static struct kvm_vm *vm; 18 + 19 + static void l2_guest_code(struct svm_test_data *svm) 20 + { 21 + __asm__ __volatile__("vmcall"); 22 + } 23 + 24 + static void l1_guest_code(struct svm_test_data *svm) 25 + { 26 + #define L2_GUEST_STACK_SIZE 64 27 + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; 28 + struct vmcb *vmcb = svm->vmcb; 29 + 30 + /* Prepare for L2 execution. */ 31 + generic_svm_setup(svm, l2_guest_code, 32 + &l2_guest_stack[L2_GUEST_STACK_SIZE]); 33 + 34 + run_guest(vmcb, svm->vmcb_gpa); 35 + 36 + GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL); 37 + GUEST_DONE(); 38 + } 39 + 40 + int main(int argc, char *argv[]) 41 + { 42 + vm_vaddr_t svm_gva; 43 + 44 + nested_svm_check_supported(); 45 + 46 + vm = vm_create_default(VCPU_ID, 0, (void *) l1_guest_code); 47 + vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); 48 + 49 + vcpu_alloc_svm(vm, &svm_gva); 50 + vcpu_args_set(vm, VCPU_ID, 1, svm_gva); 51 + 52 + for (;;) { 53 + volatile struct kvm_run *run = vcpu_state(vm, VCPU_ID); 54 + struct ucall uc; 55 + 56 + vcpu_run(vm, VCPU_ID); 57 + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, 58 + "Got exit_reason other than KVM_EXIT_IO: %u (%s)\n", 59 + run->exit_reason, 60 + exit_reason_str(run->exit_reason)); 61 + 62 + switch (get_ucall(vm, VCPU_ID, &uc)) { 63 + case UCALL_ABORT: 64 + TEST_ASSERT(false, "%s", 65 + (const char *)uc.args[0]); 66 + /* NOT REACHED */ 67 + case UCALL_SYNC: 68 + break; 69 + case UCALL_DONE: 70 + goto done; 71 + default: 72 + TEST_ASSERT(false, 73 + "Unknown ucall 0x%x.", uc.cmd); 74 + } 75 + } 76 + done: 77 + kvm_vm_free(vm); 78 + return 0; 79 + }

-12

virt/kvm/arm/vgic/vgic-mmio.c

··· 179 179 return value; 180 180 } 181 181 182 - /* 183 - * This function will return the VCPU that performed the MMIO access and 184 - * trapped from within the VM, and will return NULL if this is a userspace 185 - * access. 186 - * 187 - * We can disable preemption locally around accessing the per-CPU variable, 188 - * and use the resolved vcpu pointer after enabling preemption again, because 189 - * even if the current thread is migrated to another CPU, reading the per-CPU 190 - * value later will give us the same value as we update the per-CPU variable 191 - * in the preempt notifier handlers. 192 - */ 193 - 194 182 /* Must be called with irq->irq_lock held */ 195 183 static void vgic_hw_irq_spending(struct kvm_vcpu *vcpu, struct vgic_irq *irq, 196 184 bool is_uaccess)

+13 -3

virt/kvm/kvm_main.c

··· 4409 4409 4410 4410 /** 4411 4411 * kvm_get_running_vcpu - get the vcpu running on the current CPU. 4412 - * Thanks to preempt notifiers, this can also be called from 4413 - * preemptible context. 4412 + * 4413 + * We can disable preemption locally around accessing the per-CPU variable, 4414 + * and use the resolved vcpu pointer after enabling preemption again, 4415 + * because even if the current thread is migrated to another CPU, reading 4416 + * the per-CPU value later will give us the same value as we update the 4417 + * per-CPU variable in the preempt notifier handlers. 4414 4418 */ 4415 4419 struct kvm_vcpu *kvm_get_running_vcpu(void) 4416 4420 { 4417 - return __this_cpu_read(kvm_running_vcpu); 4421 + struct kvm_vcpu *vcpu; 4422 + 4423 + preempt_disable(); 4424 + vcpu = __this_cpu_read(kvm_running_vcpu); 4425 + preempt_enable(); 4426 + 4427 + return vcpu; 4418 4428 } 4419 4429 4420 4430 /**

Configure Feed

Configure Feed