Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

vfio/pci: Add dma-buf export support for MMIO regions

Add support for exporting PCI device MMIO regions through dma-buf,
enabling safe sharing of non-struct page memory with controlled
lifetime management. This allows RDMA and other subsystems to import
dma-buf FDs and build them into memory regions for PCI P2P operations.

The implementation provides a revocable attachment mechanism using
dma-buf move operations. MMIO regions are normally pinned as BARs
don't change physical addresses, but access is revoked when the VFIO
device is closed or a PCI reset is issued. This ensures kernel
self-defense against potentially hostile userspace.

Currently VFIO can take MMIO regions from the device's BAR and map
them into a PFNMAP VMA with special PTEs. This mapping type ensures
the memory cannot be used with things like pin_user_pages(), hmm, and
so on. In practice only the user process CPU and KVM can safely make
use of these VMA. When VFIO shuts down these VMAs are cleaned by
unmap_mapping_range() to prevent any UAF of the MMIO beyond driver
unbind.

However, VFIO type 1 has an insecure behavior where it uses
follow_pfnmap_*() to fish a MMIO PFN out of a VMA and program it back
into the IOMMU. This has a long history of enabling P2P DMA inside
VMs, but has serious lifetime problems by allowing a UAF of the MMIO
after the VFIO driver has been unbound.

Introduce DMABUF as a new safe way to export a FD based handle for the
MMIO regions. This can be consumed by existing DMABUF importers like
RDMA or DRM without opening an UAF. A following series will add an
importer to iommufd to obsolete the type 1 code and allow safe
UAF-free MMIO P2P in VM cases.

DMABUF has a built in synchronous invalidation mechanism called
move_notify. VFIO keeps track of all drivers importing its MMIO and
can invoke a synchronous invalidation callback to tell the importing
drivers to DMA unmap and forget about the MMIO pfns. This process is
being called revoke. This synchronous invalidation fully prevents any
lifecycle problems. VFIO will do this before unbinding its driver
ensuring there is no UAF of the MMIO beyond the driver lifecycle.

Further, VFIO has additional behavior to block access to the MMIO
during things like Function Level Reset. This is because some poor
platforms may experience a MCE type crash when touching MMIO of a PCI
device that is undergoing a reset. Today this is done by using
unmap_mapping_range() on the VMAs. Extend that into the DMABUF world
and temporarily revoke the MMIO from the DMABUF importers during FLR
as well. This will more robustly prevent an errant P2P from possibly
upsetting the platform.

A DMABUF FD is a preferred handle for MMIO compared to using something
like a pgmap because:
- VFIO is supported, including its P2P feature, on archs that don't
support pgmap
- PCI devices have all sorts of BAR sizes, including ones smaller
than a section so a pgmap cannot always be created
- It is undesirable to waste a lot of memory for struct pages,
especially for a case like a GPU with ~100GB of BAR size
- We want a synchronous revoke semantic to support FLR with light
hardware requirements

Use the P2P subsystem to help generate the DMA mapping. This is a
significant upgrade over the abuse of dma_map_resource() that has
historically been used by DMABUF exporters. Experience with an OOT
version of this patch shows that real systems do need this. This
approach deals with all the P2P scenarios:
- Non-zero PCI bus_offset
- ACS flags routing traffic to the IOMMU
- ACS flags that bypass the IOMMU - though vfio noiommu is required
to hit this.

There will be further work to formalize the revoke semantic in
DMABUF. For now this acts like a move_notify dynamic exporter where
importer fault handling will get a failure when they attempt to map.
This means that only fully restartable fault capable importers can
import the VFIO DMABUFs. A future revoke semantic should open this up
to more HW as the HW only needs to invalidate, not handle restartable
faults.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-10-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

authored by

Leon Romanovsky and committed by
Alex Williamson
5d74781e 35c35039

+453 -5
+3
drivers/vfio/pci/Kconfig
··· 55 55 56 56 To enable s390x KVM vfio-pci extensions, say Y. 57 57 58 + config VFIO_PCI_DMABUF 59 + def_bool y if VFIO_PCI_CORE && PCI_P2PDMA && DMA_SHARED_BUFFER 60 + 58 61 source "drivers/vfio/pci/mlx5/Kconfig" 59 62 60 63 source "drivers/vfio/pci/hisilicon/Kconfig"
+1
drivers/vfio/pci/Makefile
··· 2 2 3 3 vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o 4 4 vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o 5 + vfio-pci-core-$(CONFIG_VFIO_PCI_DMABUF) += vfio_pci_dmabuf.o 5 6 obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o 6 7 7 8 vfio-pci-y := vfio_pci.o
+5
drivers/vfio/pci/vfio_pci.c
··· 147 147 .pasid_detach_ioas = vfio_iommufd_physical_pasid_detach_ioas, 148 148 }; 149 149 150 + static const struct vfio_pci_device_ops vfio_pci_dev_ops = { 151 + .get_dmabuf_phys = vfio_pci_core_get_dmabuf_phys, 152 + }; 153 + 150 154 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) 151 155 { 152 156 struct vfio_pci_core_device *vdev; ··· 165 161 return PTR_ERR(vdev); 166 162 167 163 dev_set_drvdata(&pdev->dev, vdev); 164 + vdev->pci_ops = &vfio_pci_dev_ops; 168 165 ret = vfio_pci_core_register_device(vdev); 169 166 if (ret) 170 167 goto out_put_vdev;
+18 -4
drivers/vfio/pci/vfio_pci_config.c
··· 589 589 virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY); 590 590 new_mem = !!(new_cmd & PCI_COMMAND_MEMORY); 591 591 592 - if (!new_mem) 592 + if (!new_mem) { 593 593 vfio_pci_zap_and_down_write_memory_lock(vdev); 594 - else 594 + vfio_pci_dma_buf_move(vdev, true); 595 + } else { 595 596 down_write(&vdev->memory_lock); 597 + } 596 598 597 599 /* 598 600 * If the user is writing mem/io enable (new_mem/io) and we ··· 629 627 *virt_cmd &= cpu_to_le16(~mask); 630 628 *virt_cmd |= cpu_to_le16(new_cmd & mask); 631 629 630 + if (__vfio_pci_memory_enabled(vdev)) 631 + vfio_pci_dma_buf_move(vdev, false); 632 632 up_write(&vdev->memory_lock); 633 633 } 634 634 ··· 711 707 static void vfio_lock_and_set_power_state(struct vfio_pci_core_device *vdev, 712 708 pci_power_t state) 713 709 { 714 - if (state >= PCI_D3hot) 710 + if (state >= PCI_D3hot) { 715 711 vfio_pci_zap_and_down_write_memory_lock(vdev); 716 - else 712 + vfio_pci_dma_buf_move(vdev, true); 713 + } else { 717 714 down_write(&vdev->memory_lock); 715 + } 718 716 719 717 vfio_pci_set_power_state(vdev, state); 718 + if (__vfio_pci_memory_enabled(vdev)) 719 + vfio_pci_dma_buf_move(vdev, false); 720 720 up_write(&vdev->memory_lock); 721 721 } 722 722 ··· 908 900 909 901 if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) { 910 902 vfio_pci_zap_and_down_write_memory_lock(vdev); 903 + vfio_pci_dma_buf_move(vdev, true); 911 904 pci_try_reset_function(vdev->pdev); 905 + if (__vfio_pci_memory_enabled(vdev)) 906 + vfio_pci_dma_buf_move(vdev, false); 912 907 up_write(&vdev->memory_lock); 913 908 } 914 909 } ··· 993 982 994 983 if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) { 995 984 vfio_pci_zap_and_down_write_memory_lock(vdev); 985 + vfio_pci_dma_buf_move(vdev, true); 996 986 pci_try_reset_function(vdev->pdev); 987 + if (__vfio_pci_memory_enabled(vdev)) 988 + vfio_pci_dma_buf_move(vdev, false); 997 989 up_write(&vdev->memory_lock); 998 990 } 999 991 }
+17 -1
drivers/vfio/pci/vfio_pci_core.c
··· 287 287 * semaphore. 288 288 */ 289 289 vfio_pci_zap_and_down_write_memory_lock(vdev); 290 + vfio_pci_dma_buf_move(vdev, true); 291 + 290 292 if (vdev->pm_runtime_engaged) { 291 293 up_write(&vdev->memory_lock); 292 294 return -EINVAL; ··· 372 370 */ 373 371 down_write(&vdev->memory_lock); 374 372 __vfio_pci_runtime_pm_exit(vdev); 373 + if (__vfio_pci_memory_enabled(vdev)) 374 + vfio_pci_dma_buf_move(vdev, false); 375 375 up_write(&vdev->memory_lock); 376 376 } 377 377 ··· 693 689 eeh_dev_release(vdev->pdev); 694 690 #endif 695 691 vfio_pci_core_disable(vdev); 692 + 693 + vfio_pci_dma_buf_cleanup(vdev); 696 694 697 695 mutex_lock(&vdev->igate); 698 696 if (vdev->err_trigger) { ··· 1228 1222 */ 1229 1223 vfio_pci_set_power_state(vdev, PCI_D0); 1230 1224 1225 + vfio_pci_dma_buf_move(vdev, true); 1231 1226 ret = pci_try_reset_function(vdev->pdev); 1227 + if (__vfio_pci_memory_enabled(vdev)) 1228 + vfio_pci_dma_buf_move(vdev, false); 1232 1229 up_write(&vdev->memory_lock); 1233 1230 1234 1231 return ret; ··· 1520 1511 return vfio_pci_core_pm_exit(vdev, flags, arg, argsz); 1521 1512 case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN: 1522 1513 return vfio_pci_core_feature_token(vdev, flags, arg, argsz); 1514 + case VFIO_DEVICE_FEATURE_DMA_BUF: 1515 + return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz); 1523 1516 default: 1524 1517 return -ENOTTY; 1525 1518 } ··· 2106 2095 ret = pcim_p2pdma_init(vdev->pdev); 2107 2096 if (ret && ret != -EOPNOTSUPP) 2108 2097 return ret; 2098 + INIT_LIST_HEAD(&vdev->dmabufs); 2109 2099 init_rwsem(&vdev->memory_lock); 2110 2100 xa_init(&vdev->ctx); 2111 2101 ··· 2471 2459 break; 2472 2460 } 2473 2461 2462 + vfio_pci_dma_buf_move(vdev, true); 2474 2463 vfio_pci_zap_bars(vdev); 2475 2464 } 2476 2465 ··· 2500 2487 2501 2488 err_undo: 2502 2489 list_for_each_entry_from_reverse(vdev, &dev_set->device_list, 2503 - vdev.dev_set_list) 2490 + vdev.dev_set_list) { 2491 + if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev)) 2492 + vfio_pci_dma_buf_move(vdev, false); 2504 2493 up_write(&vdev->memory_lock); 2494 + } 2505 2495 2506 2496 list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) 2507 2497 pm_runtime_put(&vdev->pdev->dev);
+316
drivers/vfio/pci/vfio_pci_dmabuf.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. 3 + */ 4 + #include <linux/dma-buf-mapping.h> 5 + #include <linux/pci-p2pdma.h> 6 + #include <linux/dma-resv.h> 7 + 8 + #include "vfio_pci_priv.h" 9 + 10 + MODULE_IMPORT_NS("DMA_BUF"); 11 + 12 + struct vfio_pci_dma_buf { 13 + struct dma_buf *dmabuf; 14 + struct vfio_pci_core_device *vdev; 15 + struct list_head dmabufs_elm; 16 + size_t size; 17 + struct dma_buf_phys_vec *phys_vec; 18 + struct p2pdma_provider *provider; 19 + u32 nr_ranges; 20 + u8 revoked : 1; 21 + }; 22 + 23 + static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, 24 + struct dma_buf_attachment *attachment) 25 + { 26 + struct vfio_pci_dma_buf *priv = dmabuf->priv; 27 + 28 + if (!attachment->peer2peer) 29 + return -EOPNOTSUPP; 30 + 31 + if (priv->revoked) 32 + return -ENODEV; 33 + 34 + return 0; 35 + } 36 + 37 + static struct sg_table * 38 + vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment, 39 + enum dma_data_direction dir) 40 + { 41 + struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv; 42 + 43 + dma_resv_assert_held(priv->dmabuf->resv); 44 + 45 + if (priv->revoked) 46 + return ERR_PTR(-ENODEV); 47 + 48 + return dma_buf_phys_vec_to_sgt(attachment, priv->provider, 49 + priv->phys_vec, priv->nr_ranges, 50 + priv->size, dir); 51 + } 52 + 53 + static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment, 54 + struct sg_table *sgt, 55 + enum dma_data_direction dir) 56 + { 57 + dma_buf_free_sgt(attachment, sgt, dir); 58 + } 59 + 60 + static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) 61 + { 62 + struct vfio_pci_dma_buf *priv = dmabuf->priv; 63 + 64 + /* 65 + * Either this or vfio_pci_dma_buf_cleanup() will remove from the list. 66 + * The refcount prevents both. 67 + */ 68 + if (priv->vdev) { 69 + down_write(&priv->vdev->memory_lock); 70 + list_del_init(&priv->dmabufs_elm); 71 + up_write(&priv->vdev->memory_lock); 72 + vfio_device_put_registration(&priv->vdev->vdev); 73 + } 74 + kfree(priv->phys_vec); 75 + kfree(priv); 76 + } 77 + 78 + static const struct dma_buf_ops vfio_pci_dmabuf_ops = { 79 + .attach = vfio_pci_dma_buf_attach, 80 + .map_dma_buf = vfio_pci_dma_buf_map, 81 + .unmap_dma_buf = vfio_pci_dma_buf_unmap, 82 + .release = vfio_pci_dma_buf_release, 83 + }; 84 + 85 + int vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec, 86 + struct vfio_region_dma_range *dma_ranges, 87 + size_t nr_ranges, phys_addr_t start, 88 + phys_addr_t len) 89 + { 90 + phys_addr_t max_addr; 91 + unsigned int i; 92 + 93 + max_addr = start + len; 94 + for (i = 0; i < nr_ranges; i++) { 95 + phys_addr_t end; 96 + 97 + if (!dma_ranges[i].length) 98 + return -EINVAL; 99 + 100 + if (check_add_overflow(start, dma_ranges[i].offset, 101 + &phys_vec[i].paddr) || 102 + check_add_overflow(phys_vec[i].paddr, 103 + dma_ranges[i].length, &end)) 104 + return -EOVERFLOW; 105 + if (end > max_addr) 106 + return -EINVAL; 107 + 108 + phys_vec[i].len = dma_ranges[i].length; 109 + } 110 + return 0; 111 + } 112 + EXPORT_SYMBOL_GPL(vfio_pci_core_fill_phys_vec); 113 + 114 + int vfio_pci_core_get_dmabuf_phys(struct vfio_pci_core_device *vdev, 115 + struct p2pdma_provider **provider, 116 + unsigned int region_index, 117 + struct dma_buf_phys_vec *phys_vec, 118 + struct vfio_region_dma_range *dma_ranges, 119 + size_t nr_ranges) 120 + { 121 + struct pci_dev *pdev = vdev->pdev; 122 + 123 + *provider = pcim_p2pdma_provider(pdev, region_index); 124 + if (!*provider) 125 + return -EINVAL; 126 + 127 + return vfio_pci_core_fill_phys_vec( 128 + phys_vec, dma_ranges, nr_ranges, 129 + pci_resource_start(pdev, region_index), 130 + pci_resource_len(pdev, region_index)); 131 + } 132 + EXPORT_SYMBOL_GPL(vfio_pci_core_get_dmabuf_phys); 133 + 134 + static int validate_dmabuf_input(struct vfio_device_feature_dma_buf *dma_buf, 135 + struct vfio_region_dma_range *dma_ranges, 136 + size_t *lengthp) 137 + { 138 + size_t length = 0; 139 + u32 i; 140 + 141 + for (i = 0; i < dma_buf->nr_ranges; i++) { 142 + u64 offset = dma_ranges[i].offset; 143 + u64 len = dma_ranges[i].length; 144 + 145 + if (!len || !PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) 146 + return -EINVAL; 147 + 148 + if (check_add_overflow(length, len, &length)) 149 + return -EINVAL; 150 + } 151 + 152 + /* 153 + * dma_iova_try_alloc() will WARN on if userspace proposes a size that 154 + * is too big, eg with lots of ranges. 155 + */ 156 + if ((u64)(length) & DMA_IOVA_USE_SWIOTLB) 157 + return -EINVAL; 158 + 159 + *lengthp = length; 160 + return 0; 161 + } 162 + 163 + int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, 164 + struct vfio_device_feature_dma_buf __user *arg, 165 + size_t argsz) 166 + { 167 + struct vfio_device_feature_dma_buf get_dma_buf = {}; 168 + struct vfio_region_dma_range *dma_ranges; 169 + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); 170 + struct vfio_pci_dma_buf *priv; 171 + size_t length; 172 + int ret; 173 + 174 + if (!vdev->pci_ops || !vdev->pci_ops->get_dmabuf_phys) 175 + return -EOPNOTSUPP; 176 + 177 + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, 178 + sizeof(get_dma_buf)); 179 + if (ret != 1) 180 + return ret; 181 + 182 + if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf))) 183 + return -EFAULT; 184 + 185 + if (!get_dma_buf.nr_ranges || get_dma_buf.flags) 186 + return -EINVAL; 187 + 188 + /* 189 + * For PCI the region_index is the BAR number like everything else. 190 + */ 191 + if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX) 192 + return -ENODEV; 193 + 194 + dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges, 195 + sizeof(*dma_ranges)); 196 + if (IS_ERR(dma_ranges)) 197 + return PTR_ERR(dma_ranges); 198 + 199 + ret = validate_dmabuf_input(&get_dma_buf, dma_ranges, &length); 200 + if (ret) 201 + goto err_free_ranges; 202 + 203 + priv = kzalloc(sizeof(*priv), GFP_KERNEL); 204 + if (!priv) { 205 + ret = -ENOMEM; 206 + goto err_free_ranges; 207 + } 208 + priv->phys_vec = kcalloc(get_dma_buf.nr_ranges, sizeof(*priv->phys_vec), 209 + GFP_KERNEL); 210 + if (!priv->phys_vec) { 211 + ret = -ENOMEM; 212 + goto err_free_priv; 213 + } 214 + 215 + priv->vdev = vdev; 216 + priv->nr_ranges = get_dma_buf.nr_ranges; 217 + priv->size = length; 218 + ret = vdev->pci_ops->get_dmabuf_phys(vdev, &priv->provider, 219 + get_dma_buf.region_index, 220 + priv->phys_vec, dma_ranges, 221 + priv->nr_ranges); 222 + if (ret) 223 + goto err_free_phys; 224 + 225 + kfree(dma_ranges); 226 + dma_ranges = NULL; 227 + 228 + if (!vfio_device_try_get_registration(&vdev->vdev)) { 229 + ret = -ENODEV; 230 + goto err_free_phys; 231 + } 232 + 233 + exp_info.ops = &vfio_pci_dmabuf_ops; 234 + exp_info.size = priv->size; 235 + exp_info.flags = get_dma_buf.open_flags; 236 + exp_info.priv = priv; 237 + 238 + priv->dmabuf = dma_buf_export(&exp_info); 239 + if (IS_ERR(priv->dmabuf)) { 240 + ret = PTR_ERR(priv->dmabuf); 241 + goto err_dev_put; 242 + } 243 + 244 + /* dma_buf_put() now frees priv */ 245 + INIT_LIST_HEAD(&priv->dmabufs_elm); 246 + down_write(&vdev->memory_lock); 247 + dma_resv_lock(priv->dmabuf->resv, NULL); 248 + priv->revoked = !__vfio_pci_memory_enabled(vdev); 249 + list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); 250 + dma_resv_unlock(priv->dmabuf->resv); 251 + up_write(&vdev->memory_lock); 252 + 253 + /* 254 + * dma_buf_fd() consumes the reference, when the file closes the dmabuf 255 + * will be released. 256 + */ 257 + ret = dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags); 258 + if (ret < 0) 259 + goto err_dma_buf; 260 + return ret; 261 + 262 + err_dma_buf: 263 + dma_buf_put(priv->dmabuf); 264 + err_dev_put: 265 + vfio_device_put_registration(&vdev->vdev); 266 + err_free_phys: 267 + kfree(priv->phys_vec); 268 + err_free_priv: 269 + kfree(priv); 270 + err_free_ranges: 271 + kfree(dma_ranges); 272 + return ret; 273 + } 274 + 275 + void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) 276 + { 277 + struct vfio_pci_dma_buf *priv; 278 + struct vfio_pci_dma_buf *tmp; 279 + 280 + lockdep_assert_held_write(&vdev->memory_lock); 281 + 282 + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { 283 + if (!get_file_active(&priv->dmabuf->file)) 284 + continue; 285 + 286 + if (priv->revoked != revoked) { 287 + dma_resv_lock(priv->dmabuf->resv, NULL); 288 + priv->revoked = revoked; 289 + dma_buf_move_notify(priv->dmabuf); 290 + dma_resv_unlock(priv->dmabuf->resv); 291 + } 292 + fput(priv->dmabuf->file); 293 + } 294 + } 295 + 296 + void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) 297 + { 298 + struct vfio_pci_dma_buf *priv; 299 + struct vfio_pci_dma_buf *tmp; 300 + 301 + down_write(&vdev->memory_lock); 302 + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { 303 + if (!get_file_active(&priv->dmabuf->file)) 304 + continue; 305 + 306 + dma_resv_lock(priv->dmabuf->resv, NULL); 307 + list_del_init(&priv->dmabufs_elm); 308 + priv->vdev = NULL; 309 + priv->revoked = true; 310 + dma_buf_move_notify(priv->dmabuf); 311 + dma_resv_unlock(priv->dmabuf->resv); 312 + vfio_device_put_registration(&vdev->vdev); 313 + fput(priv->dmabuf->file); 314 + } 315 + up_write(&vdev->memory_lock); 316 + }
+23
drivers/vfio/pci/vfio_pci_priv.h
··· 107 107 return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; 108 108 } 109 109 110 + #ifdef CONFIG_VFIO_PCI_DMABUF 111 + int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, 112 + struct vfio_device_feature_dma_buf __user *arg, 113 + size_t argsz); 114 + void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); 115 + void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked); 116 + #else 117 + static inline int 118 + vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, 119 + struct vfio_device_feature_dma_buf __user *arg, 120 + size_t argsz) 121 + { 122 + return -ENOTTY; 123 + } 124 + static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) 125 + { 126 + } 127 + static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, 128 + bool revoked) 129 + { 130 + } 131 + #endif 132 + 110 133 #endif
+42
include/linux/vfio_pci_core.h
··· 26 26 27 27 struct vfio_pci_core_device; 28 28 struct vfio_pci_region; 29 + struct p2pdma_provider; 30 + struct dma_buf_phys_vec; 29 31 30 32 struct vfio_pci_regops { 31 33 ssize_t (*rw)(struct vfio_pci_core_device *vdev, char __user *buf, ··· 51 49 u32 flags; 52 50 }; 53 51 52 + struct vfio_pci_device_ops { 53 + int (*get_dmabuf_phys)(struct vfio_pci_core_device *vdev, 54 + struct p2pdma_provider **provider, 55 + unsigned int region_index, 56 + struct dma_buf_phys_vec *phys_vec, 57 + struct vfio_region_dma_range *dma_ranges, 58 + size_t nr_ranges); 59 + }; 60 + 61 + #if IS_ENABLED(CONFIG_VFIO_PCI_DMABUF) 62 + int vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec, 63 + struct vfio_region_dma_range *dma_ranges, 64 + size_t nr_ranges, phys_addr_t start, 65 + phys_addr_t len); 66 + int vfio_pci_core_get_dmabuf_phys(struct vfio_pci_core_device *vdev, 67 + struct p2pdma_provider **provider, 68 + unsigned int region_index, 69 + struct dma_buf_phys_vec *phys_vec, 70 + struct vfio_region_dma_range *dma_ranges, 71 + size_t nr_ranges); 72 + #else 73 + static inline int 74 + vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec, 75 + struct vfio_region_dma_range *dma_ranges, 76 + size_t nr_ranges, phys_addr_t start, 77 + phys_addr_t len) 78 + { 79 + return -EINVAL; 80 + } 81 + static inline int vfio_pci_core_get_dmabuf_phys( 82 + struct vfio_pci_core_device *vdev, struct p2pdma_provider **provider, 83 + unsigned int region_index, struct dma_buf_phys_vec *phys_vec, 84 + struct vfio_region_dma_range *dma_ranges, size_t nr_ranges) 85 + { 86 + return -EOPNOTSUPP; 87 + } 88 + #endif 89 + 54 90 struct vfio_pci_core_device { 55 91 struct vfio_device vdev; 56 92 struct pci_dev *pdev; 93 + const struct vfio_pci_device_ops *pci_ops; 57 94 void __iomem *barmap[PCI_STD_NUM_BARS]; 58 95 bool bar_mmap_supported[PCI_STD_NUM_BARS]; 59 96 u8 *pci_config_map; ··· 135 94 struct vfio_pci_core_device *sriov_pf_core_dev; 136 95 struct notifier_block nb; 137 96 struct rw_semaphore memory_lock; 97 + struct list_head dmabufs; 138 98 }; 139 99 140 100 /* Will be exported for vfio pci drivers usage */
+28
include/uapi/linux/vfio.h
··· 14 14 15 15 #include <linux/types.h> 16 16 #include <linux/ioctl.h> 17 + #include <linux/stddef.h> 17 18 18 19 #define VFIO_API_VERSION 0 19 20 ··· 1478 1477 #define VFIO_DEVICE_FEATURE_SET_MASTER 1 /* Set Bus Master */ 1479 1478 }; 1480 1479 #define VFIO_DEVICE_FEATURE_BUS_MASTER 10 1480 + 1481 + /** 1482 + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the 1483 + * regions selected. 1484 + * 1485 + * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC, 1486 + * etc. offset/length specify a slice of the region to create the dmabuf from. 1487 + * nr_ranges is the total number of (P2P DMA) ranges that comprise the dmabuf. 1488 + * 1489 + * flags should be 0. 1490 + * 1491 + * Return: The fd number on success, -1 and errno is set on failure. 1492 + */ 1493 + #define VFIO_DEVICE_FEATURE_DMA_BUF 11 1494 + 1495 + struct vfio_region_dma_range { 1496 + __u64 offset; 1497 + __u64 length; 1498 + }; 1499 + 1500 + struct vfio_device_feature_dma_buf { 1501 + __u32 region_index; 1502 + __u32 open_flags; 1503 + __u32 flags; 1504 + __u32 nr_ranges; 1505 + struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges); 1506 + }; 1481 1507 1482 1508 /* -------- API for Type1 VFIO IOMMU -------- */ 1483 1509