Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

- Update QAT vfio-pci variant driver for Gen 5, 420xx devices (Vijay
Sundar Selvamani, Suman Kumar Chakraborty, Giovanni Cabiddu)

- Fix vfio selftest MMIO DMA mapping selftest (Alex Mastro)

- Conversions to const struct class in support of class_create()
deprecation (Jori Koolstra)

- Improve selftest compiler compatibility by avoiding initializer on
variable-length array (Manish Honap)

- Define new uAPI for drivers supporting migration to advise user-
space of new initial data for reducing target startup latency.
Implemented for mlx5 vfio-pci variant driver (Yishai Hadas)

- Enable vfio selftests on aarch64, not just cross-compiles reporting
arm64 (Ted Logan)

- Update vfio selftest driver support to include additional DSA devices
(Yi Lai)

- Unconditionally include debugfs root pointer in vfio device struct,
avoiding a build failure seen in hisi_acc variant driver without
debugfs otherwise (Arnd Bergmann)

- Add support for the s390 ISM (Internal Shared Memory) device via a
new variant driver. The device is unique in the size of its BAR space
(256TiB) and lack of mmap support (Julian Ruess)

- Enforce that vfio-pci drivers implement a name in their ops structure
for use in sequestering SR-IOV VFs (Alex Williamson)

- Prune leftover group notifier code (Paolo Bonzini)

- Fix Xe vfio-pci variant driver to avoid migration support as a
dependency in the reset path and missing release call (Michał
Winiarski)

* tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfio: (23 commits)
vfio/xe: Add a missing vfio_pci_core_release_dev()
vfio/xe: Reorganize the init to decouple migration from reset
vfio: remove dead notifier code
vfio/pci: Require vfio_device_ops.name
MAINTAINERS: add VFIO ISM PCI DRIVER section
vfio/ism: Implement vfio_pci driver for ISM devices
vfio/pci: Rename vfio_config_do_rw() to vfio_pci_config_rw_single() and export it
vfio: unhide vdev->debug_root
vfio/qat: add support for Intel QAT 420xx VFs
vfio: selftests: Support DMR and GNR-D DSA devices
vfio: selftests: Build tests on aarch64
vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO
vfio/mlx5: consider inflight SAVE during PRE_COPY
net/mlx5: Add IFC bits for migration state
vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl
vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase
vfio: selftests: Fix VLA initialisation in vfio_pci_irq_set()
vfio: uapi: fix comment typo
vfio: mdev: replace mtty_dev->vd_class with a const struct class
...

+741 -168
+8 -12
Documentation/arch/s390/vfio-ap.rst
··· 431 431 * callback interfaces 432 432 433 433 open_device: 434 - The vfio_ap driver uses this callback to register a 435 - VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the matrix mdev 436 - devices. The open_device callback is invoked by userspace to connect the 437 - VFIO iommu group for the matrix mdev device to the MDEV bus. Access to the 438 - KVM structure used to configure the KVM guest is provided via this callback. 439 - The KVM structure, is used to configure the guest's access to the AP matrix 440 - defined via the vfio_ap mediated device's sysfs attribute files. 434 + the open_device callback is invoked by userspace to connect the 435 + VFIO iommu group for the matrix mdev device to the MDEV bus. The 436 + callback retrieves the KVM structure used to configure the KVM guest 437 + and configures the guest's access to the AP matrix defined via the 438 + vfio_ap mediated device's sysfs attribute files. 441 439 442 440 close_device: 443 - unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the 444 - matrix mdev device and deconfigures the guest's AP matrix. 441 + this callback deconfigures the guest's AP matrix. 445 442 446 443 ioctl: 447 444 this callback handles the VFIO_DEVICE_GET_INFO and VFIO_DEVICE_RESET ioctls ··· 446 449 447 450 Configure the guest's AP resources 448 451 ---------------------------------- 449 - Configuring the AP resources for a KVM guest will be performed when the 450 - VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier 451 - function is called when userspace connects to KVM. The guest's AP resources are 452 + Configuring the AP resources for a KVM guest will be performed at the 453 + time of ``open_device`` and ``close_device``. The guest's AP resources are 452 454 configured via its APCB by: 453 455 454 456 * Setting the bits in the APM corresponding to the APIDs assigned to the
+6
MAINTAINERS
··· 27902 27902 S: Maintained 27903 27903 F: drivers/vfio/pci/hisilicon/ 27904 27904 27905 + VFIO ISM PCI DRIVER 27906 + M: Julian Ruess <julianr@linux.ibm.com> 27907 + L: kvm@vger.kernel.org 27908 + S: Maintained 27909 + F: drivers/vfio/pci/ism/ 27910 + 27905 27911 VFIO MEDIATED DEVICE DRIVERS 27906 27912 M: Kirti Wankhede <kwankhede@nvidia.com> 27907 27913 L: kvm@vger.kernel.org
+11 -15
drivers/vfio/group.c
··· 15 15 #include <linux/anon_inodes.h> 16 16 #include "vfio.h" 17 17 18 + static char *vfio_devnode(const struct device *, umode_t *); 19 + static const struct class vfio_class = { 20 + .name = "vfio", 21 + .devnode = vfio_devnode 22 + }; 23 + 18 24 static struct vfio { 19 - struct class *class; 20 25 struct list_head group_list; 21 26 struct mutex group_lock; /* locks group_list */ 22 27 struct ida group_ida; ··· 461 456 * Device FDs hold a group file reference, therefore the group release 462 457 * is only called when there are no open devices. 463 458 */ 464 - WARN_ON(group->notifier.head); 465 459 if (group->container) 466 460 vfio_group_detach_container(group); 467 461 if (group->iommufd) { ··· 531 527 532 528 device_initialize(&group->dev); 533 529 group->dev.devt = MKDEV(MAJOR(vfio.group_devt), minor); 534 - group->dev.class = vfio.class; 530 + group->dev.class = &vfio_class; 535 531 group->dev.release = vfio_group_release; 536 532 cdev_init(&group->cdev, &vfio_group_fops); 537 533 group->cdev.owner = THIS_MODULE; ··· 545 541 /* put in vfio_group_release() */ 546 542 iommu_group_ref_get(iommu_group); 547 543 group->type = type; 548 - BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier); 549 544 550 545 return group; 551 546 } ··· 723 720 * properly hold the group reference. 724 721 */ 725 722 WARN_ON(!list_empty(&group->device_list)); 726 - WARN_ON(group->notifier.head); 727 723 728 724 /* 729 725 * Revoke all users of group->iommu_group. At this point we know there ··· 903 901 return ret; 904 902 905 903 /* /dev/vfio/$GROUP */ 906 - vfio.class = class_create("vfio"); 907 - if (IS_ERR(vfio.class)) { 908 - ret = PTR_ERR(vfio.class); 904 + ret = class_register(&vfio_class); 905 + if (ret) 909 906 goto err_group_class; 910 - } 911 - 912 - vfio.class->devnode = vfio_devnode; 913 907 914 908 ret = alloc_chrdev_region(&vfio.group_devt, 0, MINORMASK + 1, "vfio"); 915 909 if (ret) ··· 913 915 return 0; 914 916 915 917 err_alloc_chrdev: 916 - class_destroy(vfio.class); 917 - vfio.class = NULL; 918 + class_unregister(&vfio_class); 918 919 err_group_class: 919 920 vfio_container_cleanup(); 920 921 return ret; ··· 924 927 WARN_ON(!list_empty(&vfio.group_list)); 925 928 ida_destroy(&vfio.group_ida); 926 929 unregister_chrdev_region(vfio.group_devt, MINORMASK + 1); 927 - class_destroy(vfio.class); 928 - vfio.class = NULL; 930 + class_unregister(&vfio_class); 929 931 vfio_container_cleanup(); 930 932 }
+2
drivers/vfio/pci/Kconfig
··· 60 60 61 61 source "drivers/vfio/pci/mlx5/Kconfig" 62 62 63 + source "drivers/vfio/pci/ism/Kconfig" 64 + 63 65 source "drivers/vfio/pci/hisilicon/Kconfig" 64 66 65 67 source "drivers/vfio/pci/pds/Kconfig"
+2
drivers/vfio/pci/Makefile
··· 11 11 12 12 obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5/ 13 13 14 + obj-$(CONFIG_ISM_VFIO_PCI) += ism/ 15 + 14 16 obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/ 15 17 16 18 obj-$(CONFIG_PDS_VFIO_PCI) += pds/
+6 -11
drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
··· 857 857 struct hisi_acc_vf_core_device *hisi_acc_vdev = migf->hisi_acc_vdev; 858 858 loff_t *pos = &filp->f_pos; 859 859 struct vfio_precopy_info info; 860 - unsigned long minsz; 861 860 int ret; 862 861 863 - if (cmd != VFIO_MIG_GET_PRECOPY_INFO) 864 - return -ENOTTY; 865 - 866 - minsz = offsetofend(struct vfio_precopy_info, dirty_bytes); 867 - 868 - if (copy_from_user(&info, (void __user *)arg, minsz)) 869 - return -EFAULT; 870 - if (info.argsz < minsz) 871 - return -EINVAL; 862 + ret = vfio_check_precopy_ioctl(&hisi_acc_vdev->core_device.vdev, cmd, 863 + arg, &info); 864 + if (ret) 865 + return ret; 872 866 873 867 mutex_lock(&hisi_acc_vdev->state_mutex); 874 868 if (hisi_acc_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY) { ··· 887 893 mutex_unlock(&migf->lock); 888 894 mutex_unlock(&hisi_acc_vdev->state_mutex); 889 895 890 - return copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0; 896 + return copy_to_user((void __user *)arg, &info, 897 + offsetofend(struct vfio_precopy_info, dirty_bytes)) ? -EFAULT : 0; 891 898 out: 892 899 mutex_unlock(&migf->lock); 893 900 mutex_unlock(&hisi_acc_vdev->state_mutex);
+10
drivers/vfio/pci/ism/Kconfig
··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + config ISM_VFIO_PCI 3 + tristate "VFIO support for ISM devices" 4 + depends on S390 5 + select VFIO_PCI_CORE 6 + help 7 + This provides user space support for IBM Internal Shared Memory (ISM) 8 + Adapter devices using the VFIO framework. 9 + 10 + If you don't know what to do here, say N.
+3
drivers/vfio/pci/ism/Makefile
··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + obj-$(CONFIG_ISM_VFIO_PCI) += ism-vfio-pci.o 3 + ism-vfio-pci-y := main.o
+408
drivers/vfio/pci/ism/main.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * vfio-ISM driver for s390 4 + * 5 + * Copyright IBM Corp. 6 + */ 7 + 8 + #include <linux/slab.h> 9 + #include "../vfio_pci_priv.h" 10 + 11 + #define ISM_VFIO_PCI_OFFSET_SHIFT 48 12 + #define ISM_VFIO_PCI_OFFSET_TO_INDEX(off) ((off) >> ISM_VFIO_PCI_OFFSET_SHIFT) 13 + #define ISM_VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << ISM_VFIO_PCI_OFFSET_SHIFT) 14 + #define ISM_VFIO_PCI_OFFSET_MASK (((u64)(1) << ISM_VFIO_PCI_OFFSET_SHIFT) - 1) 15 + 16 + /* 17 + * Use __zpci_load() to bypass automatic use of 18 + * PCI MIO instructions which are not supported on ISM devices 19 + */ 20 + #define ISM_READ(size) \ 21 + static int ism_read##size(struct zpci_dev *zdev, int bar, \ 22 + size_t *filled, char __user *buf, \ 23 + loff_t off) \ 24 + { \ 25 + u64 req, tmp; \ 26 + u##size val; \ 27 + int ret; \ 28 + \ 29 + req = ZPCI_CREATE_REQ(READ_ONCE(zdev->fh), bar, sizeof(val)); \ 30 + ret = __zpci_load(&tmp, req, off); \ 31 + if (ret) \ 32 + return ret; \ 33 + val = (u##size)tmp; \ 34 + if (copy_to_user(buf, &val, sizeof(val))) \ 35 + return -EFAULT; \ 36 + *filled = sizeof(val); \ 37 + return 0; \ 38 + } 39 + 40 + ISM_READ(64); 41 + ISM_READ(32); 42 + ISM_READ(16); 43 + ISM_READ(8); 44 + 45 + struct ism_vfio_pci_core_device { 46 + struct vfio_pci_core_device core_device; 47 + struct kmem_cache *store_block_cache; 48 + }; 49 + 50 + static int ism_vfio_pci_open_device(struct vfio_device *core_vdev) 51 + { 52 + struct ism_vfio_pci_core_device *ivpcd; 53 + struct vfio_pci_core_device *vdev; 54 + int ret; 55 + 56 + ivpcd = container_of(core_vdev, struct ism_vfio_pci_core_device, 57 + core_device.vdev); 58 + vdev = &ivpcd->core_device; 59 + 60 + ret = vfio_pci_core_enable(vdev); 61 + if (ret) 62 + return ret; 63 + 64 + vfio_pci_core_finish_enable(vdev); 65 + return 0; 66 + } 67 + 68 + /* 69 + * ism_vfio_pci_do_io_r() 70 + * 71 + * On s390, kernel primitives such as ioread() and iowrite() are switched over 72 + * from function-handle-based PCI load/stores instructions to PCI memory-I/O (MIO) 73 + * loads/stores when these are available and not explicitly disabled. Since these 74 + * instructions cannot be used with ISM devices, ensure that classic 75 + * function-handle-based PCI instructions are used instead. 76 + */ 77 + static ssize_t ism_vfio_pci_do_io_r(struct vfio_pci_core_device *vdev, 78 + char __user *buf, loff_t off, size_t count, 79 + int bar) 80 + { 81 + struct zpci_dev *zdev = to_zpci(vdev->pdev); 82 + ssize_t done = 0; 83 + int ret; 84 + 85 + while (count) { 86 + size_t filled; 87 + 88 + if (count >= 8 && IS_ALIGNED(off, 8)) { 89 + ret = ism_read64(zdev, bar, &filled, buf, off); 90 + if (ret) 91 + return ret; 92 + } else if (count >= 4 && IS_ALIGNED(off, 4)) { 93 + ret = ism_read32(zdev, bar, &filled, buf, off); 94 + if (ret) 95 + return ret; 96 + } else if (count >= 2 && IS_ALIGNED(off, 2)) { 97 + ret = ism_read16(zdev, bar, &filled, buf, off); 98 + if (ret) 99 + return ret; 100 + } else { 101 + ret = ism_read8(zdev, bar, &filled, buf, off); 102 + if (ret) 103 + return ret; 104 + } 105 + 106 + count -= filled; 107 + done += filled; 108 + off += filled; 109 + buf += filled; 110 + } 111 + 112 + return done; 113 + } 114 + 115 + /* 116 + * ism_vfio_pci_do_io_w() 117 + * 118 + * Ensure that the PCI store block (PCISTB) instruction is used as required by the 119 + * ISM device. The ISM device also uses a 256 TiB BAR 0 for write operations, 120 + * which requires a 48bit region address space (ISM_VFIO_PCI_OFFSET_SHIFT). 121 + */ 122 + static ssize_t ism_vfio_pci_do_io_w(struct vfio_pci_core_device *vdev, 123 + char __user *buf, loff_t off, size_t count, 124 + int bar) 125 + { 126 + struct zpci_dev *zdev = to_zpci(vdev->pdev); 127 + struct ism_vfio_pci_core_device *ivpcd; 128 + ssize_t ret; 129 + void *data; 130 + u64 req; 131 + 132 + if (count > zdev->maxstbl) 133 + return -EINVAL; 134 + if (((off % PAGE_SIZE) + count) > PAGE_SIZE) 135 + return -EINVAL; 136 + 137 + ivpcd = container_of(vdev, struct ism_vfio_pci_core_device, 138 + core_device); 139 + data = kmem_cache_alloc(ivpcd->store_block_cache, GFP_KERNEL); 140 + if (!data) 141 + return -ENOMEM; 142 + 143 + if (copy_from_user(data, buf, count)) { 144 + ret = -EFAULT; 145 + goto out_free; 146 + } 147 + 148 + req = ZPCI_CREATE_REQ(READ_ONCE(zdev->fh), bar, count); 149 + ret = __zpci_store_block(data, req, off); 150 + if (ret) 151 + goto out_free; 152 + 153 + ret = count; 154 + 155 + out_free: 156 + kmem_cache_free(ivpcd->store_block_cache, data); 157 + return ret; 158 + } 159 + 160 + static ssize_t ism_vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, 161 + char __user *buf, size_t count, loff_t *ppos, 162 + bool iswrite) 163 + { 164 + int bar = ISM_VFIO_PCI_OFFSET_TO_INDEX(*ppos); 165 + loff_t pos = *ppos & ISM_VFIO_PCI_OFFSET_MASK; 166 + resource_size_t end; 167 + ssize_t done = 0; 168 + 169 + if (pci_resource_start(vdev->pdev, bar)) 170 + end = pci_resource_len(vdev->pdev, bar); 171 + else 172 + return -EINVAL; 173 + 174 + if (pos >= end) 175 + return -EINVAL; 176 + 177 + count = min(count, (size_t)(end - pos)); 178 + 179 + if (iswrite) 180 + done = ism_vfio_pci_do_io_w(vdev, buf, pos, count, bar); 181 + else 182 + done = ism_vfio_pci_do_io_r(vdev, buf, pos, count, bar); 183 + 184 + if (done >= 0) 185 + *ppos += done; 186 + 187 + return done; 188 + } 189 + 190 + static ssize_t ism_vfio_pci_config_rw(struct vfio_pci_core_device *vdev, 191 + char __user *buf, size_t count, 192 + loff_t *ppos, bool iswrite) 193 + { 194 + loff_t pos = *ppos; 195 + size_t done = 0; 196 + int ret = 0; 197 + 198 + pos &= ISM_VFIO_PCI_OFFSET_MASK; 199 + 200 + while (count) { 201 + /* 202 + * zPCI must not use MIO instructions for config space access, 203 + * so we can use common code path here. 204 + */ 205 + ret = vfio_pci_config_rw_single(vdev, buf, count, &pos, iswrite); 206 + if (ret < 0) 207 + return ret; 208 + 209 + count -= ret; 210 + done += ret; 211 + buf += ret; 212 + pos += ret; 213 + } 214 + 215 + *ppos += done; 216 + 217 + return done; 218 + } 219 + 220 + static ssize_t ism_vfio_pci_rw(struct vfio_device *core_vdev, char __user *buf, 221 + size_t count, loff_t *ppos, bool iswrite) 222 + { 223 + unsigned int index = ISM_VFIO_PCI_OFFSET_TO_INDEX(*ppos); 224 + struct vfio_pci_core_device *vdev; 225 + int ret; 226 + 227 + vdev = container_of(core_vdev, struct vfio_pci_core_device, vdev); 228 + 229 + if (!count) 230 + return 0; 231 + 232 + switch (index) { 233 + case VFIO_PCI_CONFIG_REGION_INDEX: 234 + ret = ism_vfio_pci_config_rw(vdev, buf, count, ppos, iswrite); 235 + break; 236 + 237 + case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: 238 + ret = ism_vfio_pci_bar_rw(vdev, buf, count, ppos, iswrite); 239 + break; 240 + 241 + default: 242 + return -EINVAL; 243 + } 244 + 245 + return ret; 246 + } 247 + 248 + static ssize_t ism_vfio_pci_read(struct vfio_device *core_vdev, 249 + char __user *buf, size_t count, loff_t *ppos) 250 + { 251 + return ism_vfio_pci_rw(core_vdev, buf, count, ppos, false); 252 + } 253 + 254 + static ssize_t ism_vfio_pci_write(struct vfio_device *core_vdev, 255 + const char __user *buf, size_t count, 256 + loff_t *ppos) 257 + { 258 + return ism_vfio_pci_rw(core_vdev, (char __user *)buf, count, ppos, 259 + true); 260 + } 261 + 262 + static int ism_vfio_pci_ioctl_get_region_info(struct vfio_device *core_vdev, 263 + struct vfio_region_info *info, 264 + struct vfio_info_cap *caps) 265 + { 266 + struct vfio_pci_core_device *vdev = 267 + container_of(core_vdev, struct vfio_pci_core_device, vdev); 268 + struct pci_dev *pdev = vdev->pdev; 269 + 270 + switch (info->index) { 271 + case VFIO_PCI_CONFIG_REGION_INDEX: 272 + info->offset = ISM_VFIO_PCI_INDEX_TO_OFFSET(info->index); 273 + info->size = pdev->cfg_size; 274 + info->flags = VFIO_REGION_INFO_FLAG_READ | 275 + VFIO_REGION_INFO_FLAG_WRITE; 276 + break; 277 + case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: 278 + info->offset = ISM_VFIO_PCI_INDEX_TO_OFFSET(info->index); 279 + info->size = pci_resource_len(pdev, info->index); 280 + if (!info->size) { 281 + info->flags = 0; 282 + break; 283 + } 284 + info->flags = VFIO_REGION_INFO_FLAG_READ | 285 + VFIO_REGION_INFO_FLAG_WRITE; 286 + break; 287 + default: 288 + info->offset = 0; 289 + info->size = 0; 290 + info->flags = 0; 291 + return -EINVAL; 292 + } 293 + return 0; 294 + } 295 + 296 + static int ism_vfio_pci_init_dev(struct vfio_device *core_vdev) 297 + { 298 + struct zpci_dev *zdev = to_zpci(to_pci_dev(core_vdev->dev)); 299 + struct ism_vfio_pci_core_device *ivpcd; 300 + char cache_name[20]; 301 + int ret; 302 + 303 + ivpcd = container_of(core_vdev, struct ism_vfio_pci_core_device, 304 + core_device.vdev); 305 + 306 + snprintf(cache_name, sizeof(cache_name), "ism_sb_fid_%08x", zdev->fid); 307 + 308 + ivpcd->store_block_cache = 309 + kmem_cache_create(cache_name, zdev->maxstbl, 310 + (&(struct kmem_cache_args){ 311 + .align = PAGE_SIZE, 312 + .useroffset = 0, 313 + .usersize = zdev->maxstbl, 314 + }), 315 + (SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT)); 316 + if (!ivpcd->store_block_cache) 317 + return -ENOMEM; 318 + 319 + ret = vfio_pci_core_init_dev(core_vdev); 320 + if (ret) 321 + kmem_cache_destroy(ivpcd->store_block_cache); 322 + 323 + return ret; 324 + } 325 + 326 + static void ism_vfio_pci_release_dev(struct vfio_device *core_vdev) 327 + { 328 + struct ism_vfio_pci_core_device *ivpcd = container_of( 329 + core_vdev, struct ism_vfio_pci_core_device, core_device.vdev); 330 + 331 + kmem_cache_destroy(ivpcd->store_block_cache); 332 + vfio_pci_core_release_dev(core_vdev); 333 + } 334 + 335 + static const struct vfio_device_ops ism_pci_ops = { 336 + .name = "ism-vfio-pci", 337 + .init = ism_vfio_pci_init_dev, 338 + .release = ism_vfio_pci_release_dev, 339 + .open_device = ism_vfio_pci_open_device, 340 + .close_device = vfio_pci_core_close_device, 341 + .ioctl = vfio_pci_core_ioctl, 342 + .get_region_info_caps = ism_vfio_pci_ioctl_get_region_info, 343 + .device_feature = vfio_pci_core_ioctl_feature, 344 + .read = ism_vfio_pci_read, 345 + .write = ism_vfio_pci_write, 346 + .request = vfio_pci_core_request, 347 + .match = vfio_pci_core_match, 348 + .match_token_uuid = vfio_pci_core_match_token_uuid, 349 + .bind_iommufd = vfio_iommufd_physical_bind, 350 + .unbind_iommufd = vfio_iommufd_physical_unbind, 351 + .attach_ioas = vfio_iommufd_physical_attach_ioas, 352 + .detach_ioas = vfio_iommufd_physical_detach_ioas, 353 + }; 354 + 355 + static int ism_vfio_pci_probe(struct pci_dev *pdev, 356 + const struct pci_device_id *id) 357 + { 358 + struct ism_vfio_pci_core_device *ivpcd; 359 + int ret; 360 + 361 + ivpcd = vfio_alloc_device(ism_vfio_pci_core_device, core_device.vdev, 362 + &pdev->dev, &ism_pci_ops); 363 + if (IS_ERR(ivpcd)) 364 + return PTR_ERR(ivpcd); 365 + 366 + dev_set_drvdata(&pdev->dev, &ivpcd->core_device); 367 + 368 + ret = vfio_pci_core_register_device(&ivpcd->core_device); 369 + if (ret) 370 + vfio_put_device(&ivpcd->core_device.vdev); 371 + 372 + return ret; 373 + } 374 + 375 + static void ism_vfio_pci_remove(struct pci_dev *pdev) 376 + { 377 + struct vfio_pci_core_device *core_device; 378 + struct ism_vfio_pci_core_device *ivpcd; 379 + 380 + core_device = dev_get_drvdata(&pdev->dev); 381 + ivpcd = container_of(core_device, struct ism_vfio_pci_core_device, 382 + core_device); 383 + 384 + vfio_pci_core_unregister_device(&ivpcd->core_device); 385 + vfio_put_device(&ivpcd->core_device.vdev); 386 + } 387 + 388 + static const struct pci_device_id ism_device_table[] = { 389 + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_IBM, 390 + PCI_DEVICE_ID_IBM_ISM) }, 391 + {} 392 + }; 393 + MODULE_DEVICE_TABLE(pci, ism_device_table); 394 + 395 + static struct pci_driver ism_vfio_pci_driver = { 396 + .name = KBUILD_MODNAME, 397 + .id_table = ism_device_table, 398 + .probe = ism_vfio_pci_probe, 399 + .remove = ism_vfio_pci_remove, 400 + .err_handler = &vfio_pci_core_err_handlers, 401 + .driver_managed_dma = true, 402 + }; 403 + 404 + module_pci_driver(ism_vfio_pci_driver); 405 + 406 + MODULE_LICENSE("GPL"); 407 + MODULE_DESCRIPTION("vfio-pci variant driver for the IBM Internal Shared Memory (ISM) device"); 408 + MODULE_AUTHOR("IBM Corporation");
+21 -4
drivers/vfio/pci/mlx5/cmd.c
··· 87 87 88 88 int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, 89 89 size_t *state_size, u64 *total_size, 90 - u8 query_flags) 90 + u8 *mig_state, u8 query_flags) 91 91 { 92 92 u32 out[MLX5_ST_SZ_DW(query_vhca_migration_state_out)] = {}; 93 93 u32 in[MLX5_ST_SZ_DW(query_vhca_migration_state_in)] = {}; ··· 151 151 *total_size = mvdev->chunk_mode ? 152 152 MLX5_GET64(query_vhca_migration_state_out, out, 153 153 remaining_total_size) : *state_size; 154 + 155 + if (mig_state && mvdev->mig_state_cap) 156 + *mig_state = MLX5_GET(query_vhca_migration_state_out, out, 157 + migration_state); 154 158 155 159 return 0; 156 160 } ··· 280 276 281 277 if (MLX5_CAP_GEN_2(mvdev->mdev, migration_in_chunks)) 282 278 mvdev->chunk_mode = 1; 279 + 280 + if (MLX5_CAP_GEN_2(mvdev->mdev, migration_state)) 281 + mvdev->mig_state_cap = 1; 283 282 284 283 end: 285 284 mlx5_vf_put_core_dev(mvdev->mdev); ··· 562 555 { 563 556 spin_lock_irq(&buf->migf->list_lock); 564 557 buf->stop_copy_chunk_num = 0; 558 + buf->pre_copy_init_bytes_chunk = false; 565 559 list_add_tail(&buf->buf_elm, &buf->migf->avail_list); 566 560 spin_unlock_irq(&buf->migf->list_lock); 567 561 } ··· 614 606 mlx5vf_save_callback_complete(struct mlx5_vf_migration_file *migf, 615 607 struct mlx5vf_async_data *async_data) 616 608 { 609 + migf->inflight_save = 0; 610 + wake_up_interruptible(&migf->poll_wait); 617 611 kvfree(async_data->out); 618 612 complete(&migf->save_comp); 619 613 fput(migf->filp); ··· 697 687 !next_required_umem_size; 698 688 if (async_data->header_buf) { 699 689 status = add_buf_header(async_data->header_buf, image_size, 700 - initial_pre_copy); 690 + initial_pre_copy || 691 + async_data->buf->pre_copy_init_bytes_chunk); 701 692 if (status) 702 693 goto err; 703 694 } ··· 717 706 } 718 707 } 719 708 spin_unlock_irqrestore(&migf->list_lock, flags); 720 - if (initial_pre_copy) { 709 + if (initial_pre_copy || async_data->buf->pre_copy_init_bytes_chunk) { 721 710 migf->pre_copy_initial_bytes += image_size; 722 - migf->state = MLX5_MIGF_STATE_PRE_COPY; 711 + if (initial_pre_copy) 712 + migf->state = MLX5_MIGF_STATE_PRE_COPY; 713 + if (async_data->buf->pre_copy_init_bytes_chunk) 714 + async_data->buf->pre_copy_init_bytes_chunk = false; 723 715 } 724 716 if (stop_copy_last_chunk) 725 717 migf->state = MLX5_MIGF_STATE_COMPLETE; ··· 823 809 824 810 async_data->header_buf = header_buf; 825 811 get_file(migf->filp); 812 + migf->inflight_save = 1; 826 813 err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in), 827 814 async_data->out, 828 815 out_size, mlx5vf_save_callback, ··· 834 819 return 0; 835 820 836 821 err_exec: 822 + migf->inflight_save = 0; 823 + wake_up_interruptible(&migf->poll_wait); 837 824 if (header_buf) 838 825 mlx5vf_put_data_buffer(header_buf); 839 826 fput(migf->filp);
+5 -1
drivers/vfio/pci/mlx5/cmd.h
··· 62 62 u32 *mkey_in; 63 63 enum dma_data_direction dma_dir; 64 64 u8 stop_copy_chunk_num; 65 + bool pre_copy_init_bytes_chunk; 65 66 struct list_head buf_elm; 66 67 struct mlx5_vf_migration_file *migf; 67 68 }; ··· 98 97 u32 record_tag; 99 98 u64 stop_copy_prep_size; 100 99 u64 pre_copy_initial_bytes; 100 + u64 pre_copy_initial_bytes_start; 101 101 size_t next_required_umem_size; 102 102 u8 num_ready_chunks; 103 103 /* Upon chunk mode preserve another set of buffers for stop_copy phase */ ··· 113 111 struct completion save_comp; 114 112 struct mlx5_async_ctx async_ctx; 115 113 struct mlx5vf_async_data async_data; 114 + u8 inflight_save:1; 116 115 }; 117 116 118 117 struct mlx5_vhca_cq_buf { ··· 177 174 u8 mdev_detach:1; 178 175 u8 log_active:1; 179 176 u8 chunk_mode:1; 177 + u8 mig_state_cap:1; 180 178 struct completion tracker_comp; 181 179 /* protect migration state */ 182 180 struct mutex state_mutex; ··· 202 198 int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod); 203 199 int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, 204 200 size_t *state_size, u64 *total_size, 205 - u8 query_flags); 201 + u8 *migration_state, u8 query_flags); 206 202 void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev, 207 203 const struct vfio_migration_ops *mig_ops, 208 204 const struct vfio_log_ops *log_ops);
+74 -50
drivers/vfio/pci/mlx5/main.c
··· 179 179 !list_empty(&migf->buf_list) || 180 180 migf->state == MLX5_MIGF_STATE_ERROR || 181 181 migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR || 182 - migf->state == MLX5_MIGF_STATE_PRE_COPY || 182 + (migf->state == MLX5_MIGF_STATE_PRE_COPY && 183 + !migf->inflight_save) || 183 184 migf->state == MLX5_MIGF_STATE_COMPLETE)) 184 185 return -ERESTARTSYS; 185 186 } ··· 464 463 struct mlx5_vhca_data_buffer *buf; 465 464 struct vfio_precopy_info info = {}; 466 465 loff_t *pos = &filp->f_pos; 467 - unsigned long minsz; 466 + u8 migration_state = 0; 468 467 size_t inc_length = 0; 469 - bool end_of_data = false; 468 + bool reinit_state; 469 + bool end_of_data; 470 470 int ret; 471 471 472 - if (cmd != VFIO_MIG_GET_PRECOPY_INFO) 473 - return -ENOTTY; 474 - 475 - minsz = offsetofend(struct vfio_precopy_info, dirty_bytes); 476 - 477 - if (copy_from_user(&info, (void __user *)arg, minsz)) 478 - return -EFAULT; 479 - 480 - if (info.argsz < minsz) 481 - return -EINVAL; 472 + ret = vfio_check_precopy_ioctl(&mvdev->core_device.vdev, cmd, arg, 473 + &info); 474 + if (ret) 475 + return ret; 482 476 483 477 mutex_lock(&mvdev->state_mutex); 484 478 if (mvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY && ··· 494 498 * As so, the other code below is safe with the proper locks. 495 499 */ 496 500 ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &inc_length, 497 - NULL, MLX5VF_QUERY_INC); 501 + NULL, &migration_state, 502 + MLX5VF_QUERY_INC); 498 503 if (ret) 499 504 goto err_state_unlock; 500 505 } ··· 506 509 goto err_migf_unlock; 507 510 } 508 511 509 - if (migf->pre_copy_initial_bytes > *pos) { 510 - info.initial_bytes = migf->pre_copy_initial_bytes - *pos; 511 - } else { 512 - info.dirty_bytes = migf->max_pos - *pos; 513 - if (!info.dirty_bytes) 514 - end_of_data = true; 515 - info.dirty_bytes += inc_length; 516 - } 517 - 518 - if (!end_of_data || !inc_length) { 519 - mutex_unlock(&migf->lock); 520 - goto done; 521 - } 522 - 523 - mutex_unlock(&migf->lock); 524 512 /* 525 - * We finished transferring the current state and the device has a 526 - * dirty state, save a new state to be ready for. 513 + * opt-in for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 serves 514 + * as opt-in for VFIO_PRECOPY_INFO_REINIT as well 527 515 */ 528 - buf = mlx5vf_get_data_buffer(migf, DIV_ROUND_UP(inc_length, PAGE_SIZE), 529 - DMA_FROM_DEVICE); 530 - if (IS_ERR(buf)) { 531 - ret = PTR_ERR(buf); 532 - mlx5vf_mark_err(migf); 533 - goto err_state_unlock; 516 + reinit_state = mvdev->core_device.vdev.precopy_info_v2 && 517 + migration_state == MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_INIT; 518 + end_of_data = !(migf->max_pos - *pos); 519 + if (reinit_state) { 520 + /* 521 + * Any bytes already present in memory are treated as initial 522 + * bytes, since the caller is required to read them before 523 + * reaching the new initial-bytes region. 524 + */ 525 + migf->pre_copy_initial_bytes_start = *pos; 526 + migf->pre_copy_initial_bytes = migf->max_pos - *pos; 527 + info.initial_bytes = migf->pre_copy_initial_bytes + inc_length; 528 + info.flags |= VFIO_PRECOPY_INFO_REINIT; 529 + } else { 530 + if (migf->pre_copy_initial_bytes_start + 531 + migf->pre_copy_initial_bytes > *pos) { 532 + WARN_ON_ONCE(end_of_data); 533 + info.initial_bytes = migf->pre_copy_initial_bytes_start + 534 + migf->pre_copy_initial_bytes - *pos; 535 + } else { 536 + info.dirty_bytes = (migf->max_pos - *pos) + inc_length; 537 + } 538 + } 539 + mutex_unlock(&migf->lock); 540 + 541 + if ((reinit_state || end_of_data) && inc_length) { 542 + /* 543 + * In case we finished transferring the current state and the 544 + * device has a dirty state, or that the device has a new init 545 + * state, save a new state to be ready for. 546 + */ 547 + buf = mlx5vf_get_data_buffer(migf, DIV_ROUND_UP(inc_length, PAGE_SIZE), 548 + DMA_FROM_DEVICE); 549 + if (IS_ERR(buf)) { 550 + ret = PTR_ERR(buf); 551 + mlx5vf_mark_err(migf); 552 + goto err_state_unlock; 553 + } 554 + 555 + buf->pre_copy_init_bytes_chunk = reinit_state; 556 + ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, true); 557 + if (ret) { 558 + mlx5vf_mark_err(migf); 559 + mlx5vf_put_data_buffer(buf); 560 + goto err_state_unlock; 561 + } 562 + 563 + /* 564 + * SAVE appends a header record via add_buf_header(), 565 + * let's account it as well. 566 + */ 567 + if (reinit_state) 568 + info.initial_bytes += sizeof(struct mlx5_vf_migration_header); 569 + else 570 + info.dirty_bytes += sizeof(struct mlx5_vf_migration_header); 534 571 } 535 572 536 - ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, true); 537 - if (ret) { 538 - mlx5vf_mark_err(migf); 539 - mlx5vf_put_data_buffer(buf); 540 - goto err_state_unlock; 541 - } 542 - 543 - done: 544 573 mlx5vf_state_mutex_unlock(mvdev); 545 - if (copy_to_user((void __user *)arg, &info, minsz)) 574 + if (copy_to_user((void __user *)arg, &info, 575 + offsetofend(struct vfio_precopy_info, dirty_bytes))) 546 576 return -EFAULT; 547 577 return 0; 548 578 ··· 599 575 if (migf->state == MLX5_MIGF_STATE_ERROR) 600 576 return -ENODEV; 601 577 602 - ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, NULL, 578 + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, NULL, NULL, 603 579 MLX5VF_QUERY_INC | MLX5VF_QUERY_FINAL); 604 580 if (ret) 605 581 goto err; ··· 665 641 if (ret) 666 642 goto out; 667 643 668 - ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, &full_size, 0); 644 + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, &full_size, NULL, 0); 669 645 if (ret) 670 646 goto out_pd; 671 647 ··· 1152 1128 enum mlx5_vf_migf_state state; 1153 1129 size_t size; 1154 1130 1155 - ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &size, NULL, 1131 + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &size, NULL, NULL, 1156 1132 MLX5VF_QUERY_INC | MLX5VF_QUERY_CLEANUP); 1157 1133 if (ret) 1158 1134 return ERR_PTR(ret); ··· 1277 1253 1278 1254 mutex_lock(&mvdev->state_mutex); 1279 1255 ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &state_size, 1280 - &total_size, 0); 1256 + &total_size, NULL, 0); 1281 1257 if (!ret) 1282 1258 *stop_copy_length = total_size; 1283 1259 mlx5vf_state_mutex_unlock(mvdev);
+1 -1
drivers/vfio/pci/qat/Kconfig
··· 2 2 config QAT_VFIO_PCI 3 3 tristate "VFIO support for QAT VF PCI devices" 4 4 select VFIO_PCI_CORE 5 - depends on CRYPTO_DEV_QAT_4XXX 5 + depends on CRYPTO_DEV_QAT_4XXX || CRYPTO_DEV_QAT_420XX || CRYPTO_DEV_QAT_6XXX 6 6 help 7 7 This provides migration support for Intel(R) QAT Virtual Function 8 8 using the VFIO framework.
+8 -11
drivers/vfio/pci/qat/main.c
··· 121 121 struct qat_mig_dev *mig_dev = qat_vdev->mdev; 122 122 struct vfio_precopy_info info; 123 123 loff_t *pos = &filp->f_pos; 124 - unsigned long minsz; 125 124 int ret = 0; 126 125 127 - if (cmd != VFIO_MIG_GET_PRECOPY_INFO) 128 - return -ENOTTY; 129 - 130 - minsz = offsetofend(struct vfio_precopy_info, dirty_bytes); 131 - 132 - if (copy_from_user(&info, (void __user *)arg, minsz)) 133 - return -EFAULT; 134 - if (info.argsz < minsz) 135 - return -EINVAL; 126 + ret = vfio_check_precopy_ioctl(&qat_vdev->core_device.vdev, cmd, arg, 127 + &info); 128 + if (ret) 129 + return ret; 136 130 137 131 mutex_lock(&qat_vdev->state_mutex); 138 132 if (qat_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY && ··· 154 160 mutex_unlock(&qat_vdev->state_mutex); 155 161 if (ret) 156 162 return ret; 157 - return copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0; 163 + return copy_to_user((void __user *)arg, &info, 164 + offsetofend(struct vfio_precopy_info, dirty_bytes)) ? -EFAULT : 0; 158 165 } 159 166 160 167 static ssize_t qat_vf_save_read(struct file *filp, char __user *buf, ··· 672 677 { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_INTEL, 0x4941) }, 673 678 { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_INTEL, 0x4943) }, 674 679 { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_INTEL, 0x4945) }, 680 + /* Intel QAT GEN5 420xx VF device */ 681 + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_INTEL, 0x4947) }, 675 682 /* Intel QAT GEN6 6xxx VF device */ 676 683 { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_INTEL, 0x4949) }, 677 684 {}
+5 -3
drivers/vfio/pci/vfio_pci_config.c
··· 1880 1880 return i; 1881 1881 } 1882 1882 1883 - static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user *buf, 1884 - size_t count, loff_t *ppos, bool iswrite) 1883 + ssize_t vfio_pci_config_rw_single(struct vfio_pci_core_device *vdev, 1884 + char __user *buf, size_t count, loff_t *ppos, 1885 + bool iswrite) 1885 1886 { 1886 1887 struct pci_dev *pdev = vdev->pdev; 1887 1888 struct perm_bits *perm; ··· 1971 1970 1972 1971 return ret; 1973 1972 } 1973 + EXPORT_SYMBOL_GPL(vfio_pci_config_rw_single); 1974 1974 1975 1975 ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf, 1976 1976 size_t count, loff_t *ppos, bool iswrite) ··· 1983 1981 pos &= VFIO_PCI_OFFSET_MASK; 1984 1982 1985 1983 while (count) { 1986 - ret = vfio_config_do_rw(vdev, buf, count, &pos, iswrite); 1984 + ret = vfio_pci_config_rw_single(vdev, buf, count, &pos, iswrite); 1987 1985 if (ret < 0) 1988 1986 return ret; 1989 1987
+4
drivers/vfio/pci/vfio_pci_core.c
··· 2133 2133 if (WARN_ON(vdev != dev_get_drvdata(dev))) 2134 2134 return -EINVAL; 2135 2135 2136 + /* Drivers must set a name. Required for sequestering SR-IOV VFs */ 2137 + if (WARN_ON(!vdev->vdev.ops->name)) 2138 + return -EINVAL; 2139 + 2136 2140 if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) 2137 2141 return -EINVAL; 2138 2142
+4
drivers/vfio/pci/vfio_pci_priv.h
··· 37 37 ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf, 38 38 size_t count, loff_t *ppos, bool iswrite); 39 39 40 + ssize_t vfio_pci_config_rw_single(struct vfio_pci_core_device *vdev, 41 + char __user *buf, size_t count, loff_t *ppos, 42 + bool iswrite); 43 + 40 44 ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, 41 45 size_t count, loff_t *ppos, bool iswrite); 42 46
+6 -11
drivers/vfio/pci/virtio/migrate.c
··· 443 443 struct vfio_precopy_info info = {}; 444 444 loff_t *pos = &filp->f_pos; 445 445 bool end_of_data = false; 446 - unsigned long minsz; 447 446 u32 ctx_size = 0; 448 447 int ret; 449 448 450 - if (cmd != VFIO_MIG_GET_PRECOPY_INFO) 451 - return -ENOTTY; 452 - 453 - minsz = offsetofend(struct vfio_precopy_info, dirty_bytes); 454 - if (copy_from_user(&info, (void __user *)arg, minsz)) 455 - return -EFAULT; 456 - 457 - if (info.argsz < minsz) 458 - return -EINVAL; 449 + ret = vfio_check_precopy_ioctl(&virtvdev->core_device.vdev, cmd, arg, 450 + &info); 451 + if (ret) 452 + return ret; 459 453 460 454 mutex_lock(&virtvdev->state_mutex); 461 455 if (virtvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY && ··· 508 514 509 515 done: 510 516 virtiovf_state_mutex_unlock(virtvdev); 511 - if (copy_to_user((void __user *)arg, &info, minsz)) 517 + if (copy_to_user((void __user *)arg, &info, 518 + offsetofend(struct vfio_precopy_info, dirty_bytes))) 512 519 return -EFAULT; 513 520 return 0; 514 521
+26 -18
drivers/vfio/pci/xe/main.c
··· 468 468 static void xe_vfio_pci_migration_init(struct xe_vfio_pci_core_device *xe_vdev) 469 469 { 470 470 struct vfio_device *core_vdev = &xe_vdev->core_device.vdev; 471 - struct pci_dev *pdev = to_pci_dev(core_vdev->dev); 472 - struct xe_device *xe = xe_sriov_vfio_get_pf(pdev); 473 471 474 - if (!xe) 472 + if (!xe_sriov_vfio_migration_supported(xe_vdev->xe)) 475 473 return; 476 - if (!xe_sriov_vfio_migration_supported(xe)) 477 - return; 478 - 479 - mutex_init(&xe_vdev->state_mutex); 480 - spin_lock_init(&xe_vdev->reset_lock); 481 - 482 - /* PF internal control uses vfid index starting from 1 */ 483 - xe_vdev->vfid = pci_iov_vf_id(pdev) + 1; 484 - xe_vdev->xe = xe; 485 474 486 475 core_vdev->migration_flags = VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P; 487 476 core_vdev->mig_ops = &xe_vfio_pci_migration_ops; 488 477 } 489 478 490 - static void xe_vfio_pci_migration_fini(struct xe_vfio_pci_core_device *xe_vdev) 479 + static int xe_vfio_pci_vf_init(struct xe_vfio_pci_core_device *xe_vdev) 491 480 { 492 - if (!xe_vdev->vfid) 493 - return; 481 + struct vfio_device *core_vdev = &xe_vdev->core_device.vdev; 482 + struct pci_dev *pdev = to_pci_dev(core_vdev->dev); 483 + struct xe_device *xe = xe_sriov_vfio_get_pf(pdev); 494 484 495 - mutex_destroy(&xe_vdev->state_mutex); 485 + if (!pdev->is_virtfn) 486 + return 0; 487 + if (!xe) 488 + return -ENODEV; 489 + xe_vdev->xe = xe; 490 + 491 + /* PF internal control uses vfid index starting from 1 */ 492 + xe_vdev->vfid = pci_iov_vf_id(pdev) + 1; 493 + 494 + xe_vfio_pci_migration_init(xe_vdev); 495 + 496 + return 0; 496 497 } 497 498 498 499 static int xe_vfio_pci_init_dev(struct vfio_device *core_vdev) 499 500 { 500 501 struct xe_vfio_pci_core_device *xe_vdev = 501 502 container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev); 503 + int ret; 502 504 503 - xe_vfio_pci_migration_init(xe_vdev); 505 + mutex_init(&xe_vdev->state_mutex); 506 + spin_lock_init(&xe_vdev->reset_lock); 507 + 508 + ret = xe_vfio_pci_vf_init(xe_vdev); 509 + if (ret) 510 + return ret; 504 511 505 512 return vfio_pci_core_init_dev(core_vdev); 506 513 } ··· 517 510 struct xe_vfio_pci_core_device *xe_vdev = 518 511 container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev); 519 512 520 - xe_vfio_pci_migration_fini(xe_vdev); 513 + mutex_destroy(&xe_vdev->state_mutex); 514 + vfio_pci_core_release_dev(core_vdev); 521 515 } 522 516 523 517 static const struct vfio_device_ops xe_vfio_pci_ops = {
-1
drivers/vfio/vfio.h
··· 90 90 struct mutex group_lock; 91 91 struct kvm *kvm; 92 92 struct file *opened_file; 93 - struct blocking_notifier_head notifier; 94 93 struct iommufd_ctx *iommufd; 95 94 spinlock_t kvm_ref_lock; 96 95 unsigned int cdev_device_open_cnt;
-1
drivers/vfio/vfio_iommu_type1.c
··· 36 36 #include <linux/uaccess.h> 37 37 #include <linux/vfio.h> 38 38 #include <linux/workqueue.h> 39 - #include <linux/notifier.h> 40 39 #include <linux/mm_inline.h> 41 40 #include <linux/overflow.h> 42 41 #include "vfio.h"
+21
drivers/vfio/vfio_main.c
··· 553 553 vfio_df_iommufd_unbind(df); 554 554 else 555 555 vfio_device_group_unuse_iommu(device); 556 + device->precopy_info_v2 = 0; 556 557 module_put(device->dev->driver->owner); 557 558 } 558 559 ··· 965 964 return 0; 966 965 } 967 966 967 + static int 968 + vfio_ioctl_device_feature_migration_precopy_info_v2(struct vfio_device *device, 969 + u32 flags, size_t argsz) 970 + { 971 + int ret; 972 + 973 + if (!(device->migration_flags & VFIO_MIGRATION_PRE_COPY)) 974 + return -EINVAL; 975 + 976 + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0); 977 + if (ret != 1) 978 + return ret; 979 + 980 + device->precopy_info_v2 = 1; 981 + return 0; 982 + } 983 + 968 984 static int vfio_ioctl_device_feature_migration(struct vfio_device *device, 969 985 u32 flags, void __user *arg, 970 986 size_t argsz) ··· 1269 1251 return vfio_ioctl_device_feature_migration_data_size( 1270 1252 device, feature.flags, arg->data, 1271 1253 feature.argsz - minsz); 1254 + case VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2: 1255 + return vfio_ioctl_device_feature_migration_precopy_info_v2( 1256 + device, feature.flags, feature.argsz - minsz); 1272 1257 default: 1273 1258 if (unlikely(!device->ops->device_feature)) 1274 1259 return -ENOTTY;
+14 -2
include/linux/mlx5/mlx5_ifc.h
··· 2203 2203 u8 sf_eq_usage[0x1]; 2204 2204 u8 reserved_at_d3[0x5]; 2205 2205 u8 multiplane[0x1]; 2206 - u8 reserved_at_d9[0x7]; 2206 + u8 migration_state[0x1]; 2207 + u8 reserved_at_da[0x6]; 2207 2208 2208 2209 u8 cross_vhca_object_to_object_supported[0x20]; 2209 2210 ··· 13323 13322 u8 reserved_at_60[0x20]; 13324 13323 }; 13325 13324 13325 + enum { 13326 + MLX5_QUERY_VHCA_MIG_STATE_UNINITIALIZED = 0x0, 13327 + MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_IDLE = 0x1, 13328 + MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_READY = 0x2, 13329 + MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_DIRTY = 0x3, 13330 + MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_INIT = 0x4, 13331 + }; 13332 + 13326 13333 struct mlx5_ifc_query_vhca_migration_state_out_bits { 13327 13334 u8 status[0x8]; 13328 13335 u8 reserved_at_8[0x18]; 13329 13336 13330 13337 u8 syndrome[0x20]; 13331 13338 13332 - u8 reserved_at_40[0x40]; 13339 + u8 reserved_at_40[0x20]; 13340 + 13341 + u8 migration_state[0x4]; 13342 + u8 reserved_at_64[0x1c]; 13333 13343 13334 13344 u8 required_umem_size[0x20]; 13335 13345
+40 -2
include/linux/vfio.h
··· 16 16 #include <linux/cdev.h> 17 17 #include <uapi/linux/vfio.h> 18 18 #include <linux/iova_bitmap.h> 19 + #include <linux/uaccess.h> 19 20 20 21 struct kvm; 21 22 struct iommufd_ctx; ··· 53 52 struct vfio_device_set *dev_set; 54 53 struct list_head dev_set_list; 55 54 unsigned int migration_flags; 55 + u8 precopy_info_v2; 56 56 struct kvm *kvm; 57 57 58 58 /* Members below here are private, not for driver use */ ··· 74 72 u8 iommufd_attached:1; 75 73 #endif 76 74 u8 cdev_opened:1; 77 - #ifdef CONFIG_DEBUG_FS 78 75 /* 79 76 * debug_root is a static property of the vfio_device 80 77 * which must be set prior to registering the vfio_device. 81 78 */ 82 79 struct dentry *debug_root; 83 - #endif 84 80 }; 85 81 86 82 /** ··· 282 282 if (argsz < minsz) 283 283 return -EINVAL; 284 284 return 1; 285 + } 286 + 287 + /** 288 + * vfio_check_precopy_ioctl - Validate user input for the VFIO_MIG_GET_PRECOPY_INFO ioctl 289 + * @vdev: The vfio device 290 + * @cmd: Cmd from the ioctl 291 + * @arg: Arg from the ioctl 292 + * @info: Driver pointer to hold the userspace input to the ioctl 293 + * 294 + * For use in a driver's get_precopy_info. Checks that the inputs to the 295 + * VFIO_MIG_GET_PRECOPY_INFO ioctl are correct. 296 + 297 + * Returns 0 on success, otherwise errno. 298 + */ 299 + 300 + static inline int 301 + vfio_check_precopy_ioctl(struct vfio_device *vdev, unsigned int cmd, 302 + unsigned long arg, struct vfio_precopy_info *info) 303 + { 304 + unsigned long minsz; 305 + 306 + if (cmd != VFIO_MIG_GET_PRECOPY_INFO) 307 + return -ENOTTY; 308 + 309 + minsz = offsetofend(struct vfio_precopy_info, dirty_bytes); 310 + 311 + if (copy_from_user(info, (void __user *)arg, minsz)) 312 + return -EFAULT; 313 + 314 + if (info->argsz < minsz) 315 + return -EINVAL; 316 + 317 + /* keep v1 behaviour as is for compatibility reasons */ 318 + if (vdev->precopy_info_v2) 319 + /* flags are output, set its initial value to 0 */ 320 + info->flags = 0; 321 + 322 + return 0; 285 323 } 286 324 287 325 struct vfio_device *_vfio_alloc_device(size_t size, struct device *dev,
+25 -1
include/uapi/linux/vfio.h
··· 141 141 * 142 142 * Retrieve information about the group. Fills in provided 143 143 * struct vfio_group_info. Caller sets argsz. 144 - * Return: 0 on succes, -errno on failure. 144 + * Return: 0 on success, -errno on failure. 145 145 * Availability: Always 146 146 */ 147 147 struct vfio_group_status { ··· 1266 1266 * The initial_bytes field indicates the amount of initial precopy 1267 1267 * data available from the device. This field should have a non-zero initial 1268 1268 * value and decrease as migration data is read from the device. 1269 + * The presence of the VFIO_PRECOPY_INFO_REINIT output flag indicates 1270 + * that new initial data is present on the stream. 1271 + * The new initial data may result, for example, from device reconfiguration 1272 + * during migration that requires additional initialization data. 1273 + * In that case initial_bytes may report a non-zero value irrespective of 1274 + * any previously reported values, which progresses towards zero as precopy 1275 + * data is read from the data stream. dirty_bytes is also reset 1276 + * to zero and represents the state change of the device relative to the new 1277 + * initial_bytes. 1278 + * VFIO_PRECOPY_INFO_REINIT can be reported only after userspace opts in to 1279 + * VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2. Without this opt-in, the flags field 1280 + * of struct vfio_precopy_info is reserved for bug-compatibility reasons. 1281 + * 1269 1282 * It is recommended to leave PRE_COPY for STOP_COPY only after this field 1270 1283 * reaches zero. Leaving PRE_COPY earlier might make things slower. 1271 1284 * ··· 1314 1301 struct vfio_precopy_info { 1315 1302 __u32 argsz; 1316 1303 __u32 flags; 1304 + #define VFIO_PRECOPY_INFO_REINIT (1 << 0) /* output - new initial data is present */ 1317 1305 __aligned_u64 initial_bytes; 1318 1306 __aligned_u64 dirty_bytes; 1319 1307 }; ··· 1523 1509 __u32 nr_ranges; 1524 1510 struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges); 1525 1511 }; 1512 + 1513 + /* 1514 + * Enables the migration precopy_info_v2 behaviour. 1515 + * 1516 + * VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2. 1517 + * 1518 + * On SET, enables the v2 pre_copy_info behaviour, where the 1519 + * vfio_precopy_info.flags is a valid output field. 1520 + */ 1521 + #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12 1526 1522 1527 1523 /* -------- API for Type1 VFIO IOMMU -------- */ 1528 1524
+14 -19
samples/vfio-mdev/mtty.c
··· 68 68 * Global Structures 69 69 */ 70 70 71 + static const struct class mtty_class = { 72 + .name = MTTY_CLASS_NAME 73 + }; 74 + 71 75 static struct mtty_dev { 72 76 dev_t vd_devt; 73 - struct class *vd_class; 74 77 struct cdev vd_cdev; 75 78 struct idr vd_idr; 76 79 struct device dev; ··· 840 837 struct mdev_state *mdev_state = migf->mdev_state; 841 838 loff_t *pos = &filp->f_pos; 842 839 struct vfio_precopy_info info = {}; 843 - unsigned long minsz; 844 840 int ret; 845 841 846 - if (cmd != VFIO_MIG_GET_PRECOPY_INFO) 847 - return -ENOTTY; 848 - 849 - minsz = offsetofend(struct vfio_precopy_info, dirty_bytes); 850 - 851 - if (copy_from_user(&info, (void __user *)arg, minsz)) 852 - return -EFAULT; 853 - if (info.argsz < minsz) 854 - return -EINVAL; 842 + ret = vfio_check_precopy_ioctl(&mdev_state->vdev, cmd, arg, &info); 843 + if (ret) 844 + return ret; 855 845 856 846 mutex_lock(&mdev_state->state_mutex); 857 847 if (mdev_state->state != VFIO_DEVICE_STATE_PRE_COPY && ··· 871 875 info.initial_bytes = migf->filled_size - *pos; 872 876 mutex_unlock(&migf->lock); 873 877 874 - ret = copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0; 878 + ret = copy_to_user((void __user *)arg, &info, 879 + offsetofend(struct vfio_precopy_info, dirty_bytes)) ? -EFAULT : 0; 875 880 unlock: 876 881 mtty_state_mutex_unlock(mdev_state); 877 882 return ret; ··· 1977 1980 if (ret) 1978 1981 goto err_cdev; 1979 1982 1980 - mtty_dev.vd_class = class_create(MTTY_CLASS_NAME); 1983 + ret = class_register(&mtty_class); 1981 1984 1982 - if (IS_ERR(mtty_dev.vd_class)) { 1985 + if (ret) { 1983 1986 pr_err("Error: failed to register mtty_dev class\n"); 1984 - ret = PTR_ERR(mtty_dev.vd_class); 1985 1987 goto err_driver; 1986 1988 } 1987 1989 1988 - mtty_dev.dev.class = mtty_dev.vd_class; 1990 + mtty_dev.dev.class = &mtty_class; 1989 1991 mtty_dev.dev.release = mtty_device_release; 1990 1992 dev_set_name(&mtty_dev.dev, "%s", MTTY_NAME); 1991 1993 ··· 2003 2007 device_del(&mtty_dev.dev); 2004 2008 err_put: 2005 2009 put_device(&mtty_dev.dev); 2006 - class_destroy(mtty_dev.vd_class); 2010 + class_unregister(&mtty_class); 2007 2011 err_driver: 2008 2012 mdev_unregister_driver(&mtty_driver); 2009 2013 err_cdev: ··· 2022 2026 mdev_unregister_driver(&mtty_driver); 2023 2027 cdev_del(&mtty_dev.vd_cdev); 2024 2028 unregister_chrdev_region(mtty_dev.vd_devt, MINORMASK + 1); 2025 - class_destroy(mtty_dev.vd_class); 2026 - mtty_dev.vd_class = NULL; 2029 + class_unregister(&mtty_class); 2027 2030 pr_info("mtty_dev: Unloaded!\n"); 2028 2031 } 2029 2032
+1 -1
tools/testing/selftests/vfio/Makefile
··· 1 1 ARCH ?= $(shell uname -m) 2 2 3 - ifeq (,$(filter $(ARCH),arm64 x86_64)) 3 + ifeq (,$(filter $(ARCH),aarch64 arm64 x86_64)) 4 4 # Do nothing on unsupported architectures 5 5 include ../lib.mk 6 6 else
+13 -2
tools/testing/selftests/vfio/lib/drivers/dsa/dsa.c
··· 65 65 66 66 static int dsa_probe(struct vfio_pci_device *device) 67 67 { 68 - if (!vfio_pci_device_match(device, PCI_VENDOR_ID_INTEL, 69 - PCI_DEVICE_ID_INTEL_DSA_SPR0)) 68 + const u16 vendor_id = vfio_pci_config_readw(device, PCI_VENDOR_ID); 69 + const u16 device_id = vfio_pci_config_readw(device, PCI_DEVICE_ID); 70 + 71 + if (vendor_id != PCI_VENDOR_ID_INTEL) 70 72 return -EINVAL; 73 + 74 + switch (device_id) { 75 + case PCI_DEVICE_ID_INTEL_DSA_SPR0: 76 + case PCI_DEVICE_ID_INTEL_DSA_DMR: 77 + case PCI_DEVICE_ID_INTEL_DSA_GNRD: 78 + break; 79 + default: 80 + return -EINVAL; 81 + } 71 82 72 83 if (dsa_int_handle_request_required(device)) { 73 84 dev_err(device, "Device requires requesting interrupt handles\n");
+3 -1
tools/testing/selftests/vfio/lib/vfio_pci_device.c
··· 30 30 static void vfio_pci_irq_set(struct vfio_pci_device *device, 31 31 u32 index, u32 vector, u32 count, int *fds) 32 32 { 33 - u8 buf[sizeof(struct vfio_irq_set) + sizeof(int) * count] = {}; 33 + u8 buf[sizeof(struct vfio_irq_set) + sizeof(int) * count]; 34 34 struct vfio_irq_set *irq = (void *)&buf; 35 35 int *irq_fds = (void *)&irq->data; 36 + 37 + memset(buf, 0, sizeof(buf)); 36 38 37 39 irq->argsz = sizeof(buf); 38 40 irq->flags = VFIO_IRQ_SET_ACTION_TRIGGER;
-1
tools/testing/selftests/vfio/vfio_dma_mapping_mmio_test.c
··· 100 100 iommu_unmap(iommu, &region); 101 101 } else { 102 102 VFIO_ASSERT_NE(__iommu_map(iommu, &region), 0); 103 - VFIO_ASSERT_NE(__iommu_unmap(iommu, &region, NULL), 0); 104 103 } 105 104 } 106 105