Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

dma-buf: provide phys_vec to scatter-gather mapping routine

Add dma_buf_phys_vec_to_sgt() and dma_buf_free_sgt() helpers to convert
an array of MMIO physical address ranges into scatter-gather tables with
proper DMA mapping.

These common functions are a starting point and support any PCI
drivers creating mappings from their BAR's MMIO addresses. VFIO is one
case, as shortly will be RDMA. We can review existing DRM drivers to
refactor them separately. We hope this will evolve into routines to
help common DRM that include mixed CPU and MMIO mappings.

Compared to the dma_map_resource() abuse this implementation handles
the complicated PCI P2P scenarios properly, especially when an IOMMU
is enabled:

- Direct bus address mapping without IOVA allocation for
PCI_P2PDMA_MAP_BUS_ADDR, using pci_p2pdma_bus_addr_map(). This
happens if the IOMMU is enabled but the PCIe switch ACS flags allow
transactions to avoid the host bridge.

Further, this handles the slightly obscure, case of MMIO with a
phys_addr_t that is different from the physical BAR programming
(bus offset). The phys_addr_t is converted to a dma_addr_t and
accommodates this effect. This enables certain real systems to
work, especially on ARM platforms.

- Mapping through host bridge with IOVA allocation and DMA_ATTR_MMIO
attribute for MMIO memory regions (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE).
This happens when the IOMMU is enabled and the ACS flags are forcing
all traffic to the IOMMU - ie for virtualization systems.

- Cases where P2P is not supported through the host bridge/CPU. The
P2P subsystem is the proper place to detect this and block it.

Helper functions fill_sg_entry() and calc_sg_nents() handle the
scatter-gather table construction, splitting large regions into
UINT_MAX-sized chunks to fit within sg->length field limits.

Since the physical address based DMA API forbids use of the CPU list
of the scatterlist this will produce a mangled scatterlist that has
a fully zero-length and NULL'd CPU list. The list is 0 length,
all the struct page pointers are NULL and zero sized. This is stronger
and more robust than the existing mangle_sg_table() technique. It is
a future project to migrate DMABUF as a subsystem away from using
scatterlist for this data structure.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-6-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

authored by

Leon Romanovsky and committed by
Alex Williamson
3aa31a8b 50d44fce

+277 -1
+1 -1
drivers/dma-buf/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 2 obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \ 3 - dma-fence-unwrap.o dma-resv.o 3 + dma-fence-unwrap.o dma-resv.o dma-buf-mapping.o 4 4 obj-$(CONFIG_DMABUF_HEAPS) += dma-heap.o 5 5 obj-$(CONFIG_DMABUF_HEAPS) += heaps/ 6 6 obj-$(CONFIG_SYNC_FILE) += sync_file.o
+248
drivers/dma-buf/dma-buf-mapping.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * DMA BUF Mapping Helpers 4 + * 5 + */ 6 + #include <linux/dma-buf-mapping.h> 7 + #include <linux/dma-resv.h> 8 + 9 + static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length, 10 + dma_addr_t addr) 11 + { 12 + unsigned int len, nents; 13 + int i; 14 + 15 + nents = DIV_ROUND_UP(length, UINT_MAX); 16 + for (i = 0; i < nents; i++) { 17 + len = min_t(size_t, length, UINT_MAX); 18 + length -= len; 19 + /* 20 + * DMABUF abuses scatterlist to create a scatterlist 21 + * that does not have any CPU list, only the DMA list. 22 + * Always set the page related values to NULL to ensure 23 + * importers can't use it. The phys_addr based DMA API 24 + * does not require the CPU list for mapping or unmapping. 25 + */ 26 + sg_set_page(sgl, NULL, 0, 0); 27 + sg_dma_address(sgl) = addr + i * UINT_MAX; 28 + sg_dma_len(sgl) = len; 29 + sgl = sg_next(sgl); 30 + } 31 + 32 + return sgl; 33 + } 34 + 35 + static unsigned int calc_sg_nents(struct dma_iova_state *state, 36 + struct dma_buf_phys_vec *phys_vec, 37 + size_t nr_ranges, size_t size) 38 + { 39 + unsigned int nents = 0; 40 + size_t i; 41 + 42 + if (!state || !dma_use_iova(state)) { 43 + for (i = 0; i < nr_ranges; i++) 44 + nents += DIV_ROUND_UP(phys_vec[i].len, UINT_MAX); 45 + } else { 46 + /* 47 + * In IOVA case, there is only one SG entry which spans 48 + * for whole IOVA address space, but we need to make sure 49 + * that it fits sg->length, maybe we need more. 50 + */ 51 + nents = DIV_ROUND_UP(size, UINT_MAX); 52 + } 53 + 54 + return nents; 55 + } 56 + 57 + /** 58 + * struct dma_buf_dma - holds DMA mapping information 59 + * @sgt: Scatter-gather table 60 + * @state: DMA IOVA state relevant in IOMMU-based DMA 61 + * @size: Total size of DMA transfer 62 + */ 63 + struct dma_buf_dma { 64 + struct sg_table sgt; 65 + struct dma_iova_state *state; 66 + size_t size; 67 + }; 68 + 69 + /** 70 + * dma_buf_phys_vec_to_sgt - Returns the scatterlist table of the attachment 71 + * from arrays of physical vectors. This funciton is intended for MMIO memory 72 + * only. 73 + * @attach: [in] attachment whose scatterlist is to be returned 74 + * @provider: [in] p2pdma provider 75 + * @phys_vec: [in] array of physical vectors 76 + * @nr_ranges: [in] number of entries in phys_vec array 77 + * @size: [in] total size of phys_vec 78 + * @dir: [in] direction of DMA transfer 79 + * 80 + * Returns sg_table containing the scatterlist to be returned; returns ERR_PTR 81 + * on error. May return -EINTR if it is interrupted by a signal. 82 + * 83 + * On success, the DMA addresses and lengths in the returned scatterlist are 84 + * PAGE_SIZE aligned. 85 + * 86 + * A mapping must be unmapped by using dma_buf_free_sgt(). 87 + * 88 + * NOTE: This function is intended for exporters. If direct traffic routing is 89 + * mandatory exporter should call routing pci_p2pdma_map_type() before calling 90 + * this function. 91 + */ 92 + struct sg_table *dma_buf_phys_vec_to_sgt(struct dma_buf_attachment *attach, 93 + struct p2pdma_provider *provider, 94 + struct dma_buf_phys_vec *phys_vec, 95 + size_t nr_ranges, size_t size, 96 + enum dma_data_direction dir) 97 + { 98 + unsigned int nents, mapped_len = 0; 99 + struct dma_buf_dma *dma; 100 + struct scatterlist *sgl; 101 + dma_addr_t addr; 102 + size_t i; 103 + int ret; 104 + 105 + dma_resv_assert_held(attach->dmabuf->resv); 106 + 107 + if (WARN_ON(!attach || !attach->dmabuf || !provider)) 108 + /* This function is supposed to work on MMIO memory only */ 109 + return ERR_PTR(-EINVAL); 110 + 111 + dma = kzalloc(sizeof(*dma), GFP_KERNEL); 112 + if (!dma) 113 + return ERR_PTR(-ENOMEM); 114 + 115 + switch (pci_p2pdma_map_type(provider, attach->dev)) { 116 + case PCI_P2PDMA_MAP_BUS_ADDR: 117 + /* 118 + * There is no need in IOVA at all for this flow. 119 + */ 120 + break; 121 + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: 122 + dma->state = kzalloc(sizeof(*dma->state), GFP_KERNEL); 123 + if (!dma->state) { 124 + ret = -ENOMEM; 125 + goto err_free_dma; 126 + } 127 + 128 + dma_iova_try_alloc(attach->dev, dma->state, 0, size); 129 + break; 130 + default: 131 + ret = -EINVAL; 132 + goto err_free_dma; 133 + } 134 + 135 + nents = calc_sg_nents(dma->state, phys_vec, nr_ranges, size); 136 + ret = sg_alloc_table(&dma->sgt, nents, GFP_KERNEL | __GFP_ZERO); 137 + if (ret) 138 + goto err_free_state; 139 + 140 + sgl = dma->sgt.sgl; 141 + 142 + for (i = 0; i < nr_ranges; i++) { 143 + if (!dma->state) { 144 + addr = pci_p2pdma_bus_addr_map(provider, 145 + phys_vec[i].paddr); 146 + } else if (dma_use_iova(dma->state)) { 147 + ret = dma_iova_link(attach->dev, dma->state, 148 + phys_vec[i].paddr, 0, 149 + phys_vec[i].len, dir, 150 + DMA_ATTR_MMIO); 151 + if (ret) 152 + goto err_unmap_dma; 153 + 154 + mapped_len += phys_vec[i].len; 155 + } else { 156 + addr = dma_map_phys(attach->dev, phys_vec[i].paddr, 157 + phys_vec[i].len, dir, 158 + DMA_ATTR_MMIO); 159 + ret = dma_mapping_error(attach->dev, addr); 160 + if (ret) 161 + goto err_unmap_dma; 162 + } 163 + 164 + if (!dma->state || !dma_use_iova(dma->state)) 165 + sgl = fill_sg_entry(sgl, phys_vec[i].len, addr); 166 + } 167 + 168 + if (dma->state && dma_use_iova(dma->state)) { 169 + WARN_ON_ONCE(mapped_len != size); 170 + ret = dma_iova_sync(attach->dev, dma->state, 0, mapped_len); 171 + if (ret) 172 + goto err_unmap_dma; 173 + 174 + sgl = fill_sg_entry(sgl, mapped_len, dma->state->addr); 175 + } 176 + 177 + dma->size = size; 178 + 179 + /* 180 + * No CPU list included — set orig_nents = 0 so others can detect 181 + * this via SG table (use nents only). 182 + */ 183 + dma->sgt.orig_nents = 0; 184 + 185 + 186 + /* 187 + * SGL must be NULL to indicate that SGL is the last one 188 + * and we allocated correct number of entries in sg_alloc_table() 189 + */ 190 + WARN_ON_ONCE(sgl); 191 + return &dma->sgt; 192 + 193 + err_unmap_dma: 194 + if (!i || !dma->state) { 195 + ; /* Do nothing */ 196 + } else if (dma_use_iova(dma->state)) { 197 + dma_iova_destroy(attach->dev, dma->state, mapped_len, dir, 198 + DMA_ATTR_MMIO); 199 + } else { 200 + for_each_sgtable_dma_sg(&dma->sgt, sgl, i) 201 + dma_unmap_phys(attach->dev, sg_dma_address(sgl), 202 + sg_dma_len(sgl), dir, DMA_ATTR_MMIO); 203 + } 204 + sg_free_table(&dma->sgt); 205 + err_free_state: 206 + kfree(dma->state); 207 + err_free_dma: 208 + kfree(dma); 209 + return ERR_PTR(ret); 210 + } 211 + EXPORT_SYMBOL_NS_GPL(dma_buf_phys_vec_to_sgt, "DMA_BUF"); 212 + 213 + /** 214 + * dma_buf_free_sgt- unmaps the buffer 215 + * @attach: [in] attachment to unmap buffer from 216 + * @sgt: [in] scatterlist info of the buffer to unmap 217 + * @dir: [in] direction of DMA transfer 218 + * 219 + * This unmaps a DMA mapping for @attached obtained 220 + * by dma_buf_phys_vec_to_sgt(). 221 + */ 222 + void dma_buf_free_sgt(struct dma_buf_attachment *attach, struct sg_table *sgt, 223 + enum dma_data_direction dir) 224 + { 225 + struct dma_buf_dma *dma = container_of(sgt, struct dma_buf_dma, sgt); 226 + int i; 227 + 228 + dma_resv_assert_held(attach->dmabuf->resv); 229 + 230 + if (!dma->state) { 231 + ; /* Do nothing */ 232 + } else if (dma_use_iova(dma->state)) { 233 + dma_iova_destroy(attach->dev, dma->state, dma->size, dir, 234 + DMA_ATTR_MMIO); 235 + } else { 236 + struct scatterlist *sgl; 237 + 238 + for_each_sgtable_dma_sg(sgt, sgl, i) 239 + dma_unmap_phys(attach->dev, sg_dma_address(sgl), 240 + sg_dma_len(sgl), dir, DMA_ATTR_MMIO); 241 + } 242 + 243 + sg_free_table(sgt); 244 + kfree(dma->state); 245 + kfree(dma); 246 + 247 + } 248 + EXPORT_SYMBOL_NS_GPL(dma_buf_free_sgt, "DMA_BUF");
+17
include/linux/dma-buf-mapping.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * DMA BUF Mapping Helpers 4 + * 5 + */ 6 + #ifndef __DMA_BUF_MAPPING_H__ 7 + #define __DMA_BUF_MAPPING_H__ 8 + #include <linux/dma-buf.h> 9 + 10 + struct sg_table *dma_buf_phys_vec_to_sgt(struct dma_buf_attachment *attach, 11 + struct p2pdma_provider *provider, 12 + struct dma_buf_phys_vec *phys_vec, 13 + size_t nr_ranges, size_t size, 14 + enum dma_data_direction dir); 15 + void dma_buf_free_sgt(struct dma_buf_attachment *attach, struct sg_table *sgt, 16 + enum dma_data_direction dir); 17 + #endif
+11
include/linux/dma-buf.h
··· 22 22 #include <linux/fs.h> 23 23 #include <linux/dma-fence.h> 24 24 #include <linux/wait.h> 25 + #include <linux/pci-p2pdma.h> 25 26 26 27 struct device; 27 28 struct dma_buf; ··· 529 528 int flags; 530 529 struct dma_resv *resv; 531 530 void *priv; 531 + }; 532 + 533 + /** 534 + * struct dma_buf_phys_vec - describe continuous chunk of memory 535 + * @paddr: physical address of that chunk 536 + * @len: Length of this chunk 537 + */ 538 + struct dma_buf_phys_vec { 539 + phys_addr_t paddr; 540 + size_t len; 532 541 }; 533 542 534 543 /**