Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

- in-order support in virtio core

- multiple address space support in vduse

- fixes, cleanups all over the place, notably dma alignment fixes for
non-cache-coherent systems

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (59 commits)
vduse: avoid adding implicit padding
vhost: fix caching attributes of MMIO regions by setting them explicitly
vdpa/mlx5: update MAC address handling in mlx5_vdpa_set_attr()
vdpa/mlx5: reuse common function for MAC address updates
vdpa/mlx5: update mlx_features with driver state check
crypto: virtio: Replace package id with numa node id
crypto: virtio: Remove duplicated virtqueue_kick in virtio_crypto_skcipher_crypt_req
crypto: virtio: Add spinlock protection with virtqueue notification
Documentation: Add documentation for VDUSE Address Space IDs
vduse: bump version number
vduse: add vq group asid support
vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls
vduse: take out allocations from vduse_dev_alloc_coherent
vduse: remove unused vaddr parameter of vduse_domain_free_coherent
vduse: refactor vdpa_dev_add for goto err handling
vhost: forbid change vq groups ASID if DRIVER_OK is set
vdpa: document set_group_asid thread safety
vduse: return internal vq group struct as map token
vduse: add vq group support
vduse: add v1 API definition
...

+1561 -513
+52
Documentation/core-api/dma-api-howto.rst
··· 146 146 networking subsystems make sure that the buffers they use are valid 147 147 for you to DMA from/to. 148 148 149 + __dma_from_device_group_begin/end annotations 150 + ============================================= 151 + 152 + As explained previously, when a structure contains a DMA_FROM_DEVICE / 153 + DMA_BIDIRECTIONAL buffer (device writes to memory) alongside fields that the 154 + CPU writes to, cache line sharing between the DMA buffer and CPU-written fields 155 + can cause data corruption on CPUs with DMA-incoherent caches. 156 + 157 + The ``__dma_from_device_group_begin(GROUP)/__dma_from_device_group_end(GROUP)`` 158 + macros ensure proper alignment to prevent this:: 159 + 160 + struct my_device { 161 + spinlock_t lock1; 162 + __dma_from_device_group_begin(); 163 + char dma_buffer1[16]; 164 + char dma_buffer2[16]; 165 + __dma_from_device_group_end(); 166 + spinlock_t lock2; 167 + }; 168 + 169 + To isolate a DMA buffer from adjacent fields, use 170 + ``__dma_from_device_group_begin(GROUP)`` before the first DMA buffer 171 + field and ``__dma_from_device_group_end(GROUP)`` after the last DMA 172 + buffer field (with the same GROUP name). This protects both the head 173 + and tail of the buffer from cache line sharing. 174 + 175 + The GROUP parameter is an optional identifier that names the DMA buffer group 176 + (in case you have several in the same structure):: 177 + 178 + struct my_device { 179 + spinlock_t lock1; 180 + __dma_from_device_group_begin(buffer1); 181 + char dma_buffer1[16]; 182 + __dma_from_device_group_end(buffer1); 183 + spinlock_t lock2; 184 + __dma_from_device_group_begin(buffer2); 185 + char dma_buffer2[16]; 186 + __dma_from_device_group_end(buffer2); 187 + }; 188 + 189 + On cache-coherent platforms these macros expand to zero-length array markers. 190 + On non-coherent platforms, they also ensure the minimal DMA alignment, which 191 + can be as large as 128 bytes. 192 + 193 + .. note:: 194 + 195 + It is allowed (though somewhat fragile) to include extra fields, not 196 + intended for DMA from the device, within the group (in order to pack the 197 + structure tightly) - but only as long as the CPU does not write these 198 + fields while any fields in the group are mapped for DMA_FROM_DEVICE or 199 + DMA_BIDIRECTIONAL. 200 + 149 201 DMA addressing capabilities 150 202 =========================== 151 203
+9
Documentation/core-api/dma-attributes.rst
··· 148 148 For architectures that require cache flushing for DMA coherence 149 149 DMA_ATTR_MMIO will not perform any cache flushing. The address 150 150 provided must never be mapped cacheable into the CPU. 151 + 152 + DMA_ATTR_CPU_CACHE_CLEAN 153 + ------------------------ 154 + 155 + This attribute indicates the CPU will not dirty any cacheline overlapping this 156 + DMA_FROM_DEVICE/DMA_BIDIRECTIONAL buffer while it is mapped. This allows 157 + multiple small buffers to safely share a cacheline without risk of data 158 + corruption, suppressing DMA debug warnings about overlapping mappings. 159 + All mappings sharing a cacheline should have this attribute.
+53
Documentation/userspace-api/vduse.rst
··· 230 230 5. Inject an interrupt for specific virtqueue with the VDUSE_INJECT_VQ_IRQ ioctl 231 231 after the used ring is filled. 232 232 233 + Enabling ASID (API version 1) 234 + ------------------------------ 235 + 236 + VDUSE supports per-address-space identifiers (ASIDs) starting with API 237 + version 1. Set it up with ioctl(VDUSE_SET_API_VERSION) on `/dev/vduse/control` 238 + and pass `VDUSE_API_VERSION_1` before creating a new VDUSE instance with 239 + ioctl(VDUSE_CREATE_DEV). 240 + 241 + Afterwards, you can use the member asid of ioctl(VDUSE_VQ_SETUP) argument to 242 + select the address space of the IOTLB you are querying. The driver could 243 + change the address space of any virtqueue group by using the 244 + VDUSE_SET_VQ_GROUP_ASID VDUSE message type, and the VDUSE instance needs to 245 + reply with VDUSE_REQ_RESULT_OK if it was possible to change it. 246 + 247 + Similarly, you can use ioctl(VDUSE_IOTLB_GET_FD2) to obtain the file descriptor 248 + describing an IOVA region of a specific ASID. Example usage: 249 + 250 + .. code-block:: c 251 + 252 + static void *iova_to_va(int dev_fd, uint32_t asid, uint64_t iova, 253 + uint64_t *len) 254 + { 255 + int fd; 256 + void *addr; 257 + size_t size; 258 + struct vduse_iotlb_entry_v2 entry = { 0 }; 259 + 260 + entry.v1.start = iova; 261 + entry.v1.last = iova; 262 + entry.asid = asid; 263 + 264 + fd = ioctl(dev_fd, VDUSE_IOTLB_GET_FD2, &entry); 265 + if (fd < 0) 266 + return NULL; 267 + 268 + size = entry.v1.last - entry.v1.start + 1; 269 + *len = entry.v1.last - iova + 1; 270 + addr = mmap(0, size, perm_to_prot(entry.v1.perm), MAP_SHARED, 271 + fd, entry.v1.offset); 272 + close(fd); 273 + if (addr == MAP_FAILED) 274 + return NULL; 275 + 276 + /* 277 + * Using some data structures such as linked list to store 278 + * the iotlb mapping. The munmap(2) should be called for the 279 + * cached mapping when the corresponding VDUSE_UPDATE_IOTLB 280 + * message is received or the device is reset. 281 + */ 282 + 283 + return addr + iova - entry.v1.start; 284 + } 285 + 233 286 For more details on the uAPI, please see include/uapi/linux/vduse.h.
+3
drivers/char/hw_random/virtio-rng.c
··· 11 11 #include <linux/spinlock.h> 12 12 #include <linux/virtio.h> 13 13 #include <linux/virtio_rng.h> 14 + #include <linux/dma-mapping.h> 14 15 #include <linux/module.h> 15 16 #include <linux/slab.h> 16 17 ··· 29 28 unsigned int data_avail; 30 29 unsigned int data_idx; 31 30 /* minimal size returned by rng_buffer_size() */ 31 + __dma_from_device_group_begin(); 32 32 #if SMP_CACHE_BYTES < 32 33 33 u8 data[32]; 34 34 #else 35 35 u8 data[SMP_CACHE_BYTES]; 36 36 #endif 37 + __dma_from_device_group_end(); 37 38 }; 38 39 39 40 static void random_recv_done(struct virtqueue *vq)
+11 -4
drivers/gpio/gpio-virtio.c
··· 10 10 */ 11 11 12 12 #include <linux/completion.h> 13 + #include <linux/dma-mapping.h> 13 14 #include <linux/err.h> 14 15 #include <linux/gpio/driver.h> 15 16 #include <linux/io.h> ··· 25 24 struct virtio_gpio_line { 26 25 struct mutex lock; /* Protects line operation */ 27 26 struct completion completion; 28 - struct virtio_gpio_request req ____cacheline_aligned; 29 - struct virtio_gpio_response res ____cacheline_aligned; 27 + 30 28 unsigned int rxlen; 29 + 30 + __dma_from_device_group_begin(); 31 + struct virtio_gpio_request req; 32 + struct virtio_gpio_response res; 33 + __dma_from_device_group_end(); 31 34 }; 32 35 33 36 struct vgpio_irq_line { ··· 42 37 bool update_pending; 43 38 bool queue_pending; 44 39 45 - struct virtio_gpio_irq_request ireq ____cacheline_aligned; 46 - struct virtio_gpio_irq_response ires ____cacheline_aligned; 40 + __dma_from_device_group_begin(); 41 + struct virtio_gpio_irq_request ireq; 42 + struct virtio_gpio_irq_response ires; 43 + __dma_from_device_group_end(); 47 44 }; 48 45 49 46 struct virtio_gpio {
+12 -5
drivers/scsi/virtio_scsi.c
··· 29 29 #include <scsi/scsi_tcq.h> 30 30 #include <scsi/scsi_devinfo.h> 31 31 #include <linux/seqlock.h> 32 + #include <linux/dma-mapping.h> 32 33 33 34 #include "sd.h" 34 35 ··· 62 61 63 62 struct virtio_scsi_event_node { 64 63 struct virtio_scsi *vscsi; 65 - struct virtio_scsi_event event; 64 + struct virtio_scsi_event *event; 66 65 struct work_struct work; 67 66 }; 68 67 ··· 90 89 91 90 struct virtio_scsi_vq ctrl_vq; 92 91 struct virtio_scsi_vq event_vq; 92 + 93 + __dma_from_device_group_begin(); 94 + struct virtio_scsi_event events[VIRTIO_SCSI_EVENT_LEN]; 95 + __dma_from_device_group_end(); 96 + 93 97 struct virtio_scsi_vq req_vqs[]; 94 98 }; 95 99 ··· 243 237 unsigned long flags; 244 238 245 239 INIT_WORK(&event_node->work, virtscsi_handle_event); 246 - sg_init_one(&sg, &event_node->event, sizeof(struct virtio_scsi_event)); 240 + sg_init_one(&sg, event_node->event, sizeof(struct virtio_scsi_event)); 247 241 248 242 spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags); 249 243 250 - err = virtqueue_add_inbuf(vscsi->event_vq.vq, &sg, 1, event_node, 251 - GFP_ATOMIC); 244 + err = virtqueue_add_inbuf_cache_clean(vscsi->event_vq.vq, &sg, 1, event_node, 245 + GFP_ATOMIC); 252 246 if (!err) 253 247 virtqueue_kick(vscsi->event_vq.vq); 254 248 ··· 263 257 264 258 for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) { 265 259 vscsi->event_list[i].vscsi = vscsi; 260 + vscsi->event_list[i].event = &vscsi->events[i]; 266 261 virtscsi_kick_event(vscsi, &vscsi->event_list[i]); 267 262 } 268 263 ··· 387 380 struct virtio_scsi_event_node *event_node = 388 381 container_of(work, struct virtio_scsi_event_node, work); 389 382 struct virtio_scsi *vscsi = event_node->vscsi; 390 - struct virtio_scsi_event *event = &event_node->event; 383 + struct virtio_scsi_event *event = event_node->event; 391 384 392 385 if (event->event & 393 386 cpu_to_virtio32(vscsi->vdev, VIRTIO_SCSI_T_EVENTS_MISSED)) {
+85 -71
drivers/vdpa/mlx5/net/mlx5_vnet.c
··· 2125 2125 mlx5_destroy_flow_table(ndev->rxft); 2126 2126 } 2127 2127 2128 + static int mlx5_vdpa_change_mac(struct mlx5_vdpa_net *ndev, 2129 + struct mlx5_core_dev *pfmdev, 2130 + const u8 *new_mac) 2131 + { 2132 + struct mlx5_vdpa_dev *mvdev = &ndev->mvdev; 2133 + u8 old_mac[ETH_ALEN]; 2134 + 2135 + if (is_zero_ether_addr(new_mac)) 2136 + return -EINVAL; 2137 + 2138 + if (!is_zero_ether_addr(ndev->config.mac)) { 2139 + if (mlx5_mpfs_del_mac(pfmdev, ndev->config.mac)) { 2140 + mlx5_vdpa_warn(mvdev, "failed to delete old MAC %pM from MPFS table\n", 2141 + ndev->config.mac); 2142 + return -EIO; 2143 + } 2144 + } 2145 + 2146 + if (mlx5_mpfs_add_mac(pfmdev, (u8 *)new_mac)) { 2147 + mlx5_vdpa_warn(mvdev, "failed to insert new MAC %pM into MPFS table\n", 2148 + new_mac); 2149 + return -EIO; 2150 + } 2151 + 2152 + /* backup the original mac address so that if failed to add the forward rules 2153 + * we could restore it 2154 + */ 2155 + ether_addr_copy(old_mac, ndev->config.mac); 2156 + 2157 + ether_addr_copy(ndev->config.mac, new_mac); 2158 + 2159 + /* Need recreate the flow table entry, so that the packet could forward back 2160 + */ 2161 + mac_vlan_del(ndev, old_mac, 0, false); 2162 + 2163 + if (mac_vlan_add(ndev, ndev->config.mac, 0, false)) { 2164 + mlx5_vdpa_warn(mvdev, "failed to insert forward rules, try to restore\n"); 2165 + 2166 + /* Although it hardly run here, we still need double check */ 2167 + if (is_zero_ether_addr(old_mac)) { 2168 + mlx5_vdpa_warn(mvdev, "restore mac failed: Original MAC is zero\n"); 2169 + return -EIO; 2170 + } 2171 + 2172 + /* Try to restore original mac address to MFPS table, and try to restore 2173 + * the forward rule entry. 2174 + */ 2175 + if (mlx5_mpfs_del_mac(pfmdev, ndev->config.mac)) { 2176 + mlx5_vdpa_warn(mvdev, "restore mac failed: delete MAC %pM from MPFS table failed\n", 2177 + ndev->config.mac); 2178 + } 2179 + 2180 + if (mlx5_mpfs_add_mac(pfmdev, old_mac)) { 2181 + mlx5_vdpa_warn(mvdev, "restore mac failed: insert old MAC %pM into MPFS table failed\n", 2182 + old_mac); 2183 + } 2184 + 2185 + ether_addr_copy(ndev->config.mac, old_mac); 2186 + 2187 + if (mac_vlan_add(ndev, ndev->config.mac, 0, false)) 2188 + mlx5_vdpa_warn(mvdev, "restore forward rules failed: insert forward rules failed\n"); 2189 + 2190 + return -EIO; 2191 + } 2192 + 2193 + return 0; 2194 + } 2195 + 2128 2196 static virtio_net_ctrl_ack handle_ctrl_mac(struct mlx5_vdpa_dev *mvdev, u8 cmd) 2129 2197 { 2130 2198 struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); ··· 2200 2132 virtio_net_ctrl_ack status = VIRTIO_NET_ERR; 2201 2133 struct mlx5_core_dev *pfmdev; 2202 2134 size_t read; 2203 - u8 mac[ETH_ALEN], mac_back[ETH_ALEN]; 2135 + u8 mac[ETH_ALEN]; 2204 2136 2205 2137 pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev)); 2206 2138 switch (cmd) { 2207 2139 case VIRTIO_NET_CTRL_MAC_ADDR_SET: 2208 - read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->riov, (void *)mac, ETH_ALEN); 2140 + read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->riov, 2141 + (void *)mac, ETH_ALEN); 2209 2142 if (read != ETH_ALEN) 2210 2143 break; 2211 2144 ··· 2214 2145 status = VIRTIO_NET_OK; 2215 2146 break; 2216 2147 } 2217 - 2218 - if (is_zero_ether_addr(mac)) 2219 - break; 2220 - 2221 - if (!is_zero_ether_addr(ndev->config.mac)) { 2222 - if (mlx5_mpfs_del_mac(pfmdev, ndev->config.mac)) { 2223 - mlx5_vdpa_warn(mvdev, "failed to delete old MAC %pM from MPFS table\n", 2224 - ndev->config.mac); 2225 - break; 2226 - } 2227 - } 2228 - 2229 - if (mlx5_mpfs_add_mac(pfmdev, mac)) { 2230 - mlx5_vdpa_warn(mvdev, "failed to insert new MAC %pM into MPFS table\n", 2231 - mac); 2232 - break; 2233 - } 2234 - 2235 - /* backup the original mac address so that if failed to add the forward rules 2236 - * we could restore it 2237 - */ 2238 - memcpy(mac_back, ndev->config.mac, ETH_ALEN); 2239 - 2240 - memcpy(ndev->config.mac, mac, ETH_ALEN); 2241 - 2242 - /* Need recreate the flow table entry, so that the packet could forward back 2243 - */ 2244 - mac_vlan_del(ndev, mac_back, 0, false); 2245 - 2246 - if (mac_vlan_add(ndev, ndev->config.mac, 0, false)) { 2247 - mlx5_vdpa_warn(mvdev, "failed to insert forward rules, try to restore\n"); 2248 - 2249 - /* Although it hardly run here, we still need double check */ 2250 - if (is_zero_ether_addr(mac_back)) { 2251 - mlx5_vdpa_warn(mvdev, "restore mac failed: Original MAC is zero\n"); 2252 - break; 2253 - } 2254 - 2255 - /* Try to restore original mac address to MFPS table, and try to restore 2256 - * the forward rule entry. 2257 - */ 2258 - if (mlx5_mpfs_del_mac(pfmdev, ndev->config.mac)) { 2259 - mlx5_vdpa_warn(mvdev, "restore mac failed: delete MAC %pM from MPFS table failed\n", 2260 - ndev->config.mac); 2261 - } 2262 - 2263 - if (mlx5_mpfs_add_mac(pfmdev, mac_back)) { 2264 - mlx5_vdpa_warn(mvdev, "restore mac failed: insert old MAC %pM into MPFS table failed\n", 2265 - mac_back); 2266 - } 2267 - 2268 - memcpy(ndev->config.mac, mac_back, ETH_ALEN); 2269 - 2270 - if (mac_vlan_add(ndev, ndev->config.mac, 0, false)) 2271 - mlx5_vdpa_warn(mvdev, "restore forward rules failed: insert forward rules failed\n"); 2272 - 2273 - break; 2274 - } 2275 - 2276 - status = VIRTIO_NET_OK; 2148 + status = mlx5_vdpa_change_mac(ndev, pfmdev, mac) ? VIRTIO_NET_ERR : 2149 + VIRTIO_NET_OK; 2277 2150 break; 2278 2151 2279 2152 default: ··· 3651 3640 struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); 3652 3641 int err = 0; 3653 3642 3654 - if (group >= MLX5_VDPA_NUMVQ_GROUPS) 3655 - return -EINVAL; 3656 - 3657 3643 mvdev->mres.group2asid[group] = asid; 3658 3644 3659 3645 mutex_lock(&mvdev->mres.lock); ··· 4052 4044 static int mlx5_vdpa_set_attr(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *dev, 4053 4045 const struct vdpa_dev_set_config *add_config) 4054 4046 { 4055 - struct virtio_net_config *config; 4056 4047 struct mlx5_core_dev *pfmdev; 4057 4048 struct mlx5_vdpa_dev *mvdev; 4058 4049 struct mlx5_vdpa_net *ndev; ··· 4061 4054 mvdev = to_mvdev(dev); 4062 4055 ndev = to_mlx5_vdpa_ndev(mvdev); 4063 4056 mdev = mvdev->mdev; 4064 - config = &ndev->config; 4065 4057 4066 4058 down_write(&ndev->reslock); 4067 - if (add_config->mask & (1 << VDPA_ATTR_DEV_NET_CFG_MACADDR)) { 4059 + 4060 + if (add_config->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) { 4061 + if (!(ndev->mvdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) { 4062 + ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_NET_F_MAC); 4063 + } else { 4064 + mlx5_vdpa_warn(mvdev, "device running, skip updating MAC\n"); 4065 + err = -EBUSY; 4066 + goto out; 4067 + } 4068 4068 pfmdev = pci_get_drvdata(pci_physfn(mdev->pdev)); 4069 - err = mlx5_mpfs_add_mac(pfmdev, config->mac); 4070 - if (!err) 4071 - ether_addr_copy(config->mac, add_config->net.mac); 4069 + err = mlx5_vdpa_change_mac(ndev, pfmdev, 4070 + (u8 *)add_config->net.mac); 4072 4071 } 4073 4072 4073 + out: 4074 4074 up_write(&ndev->reslock); 4075 4075 return err; 4076 4076 }
-6
drivers/vdpa/vdpa_sim/vdpa_sim.c
··· 606 606 struct vhost_iotlb *iommu; 607 607 int i; 608 608 609 - if (group > vdpasim->dev_attr.ngroups) 610 - return -EINVAL; 611 - 612 - if (asid >= vdpasim->dev_attr.nas) 613 - return -EINVAL; 614 - 615 609 iommu = &vdpasim->iommu[asid]; 616 610 617 611 mutex_lock(&vdpasim->mutex);
+8 -19
drivers/vdpa/vdpa_user/iova_domain.c
··· 493 493 vduse_domain_free_iova(iovad, dma_addr, size); 494 494 } 495 495 496 - void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain, 497 - size_t size, dma_addr_t *dma_addr, 498 - gfp_t flag) 496 + dma_addr_t vduse_domain_alloc_coherent(struct vduse_iova_domain *domain, 497 + size_t size, void *orig) 499 498 { 500 499 struct iova_domain *iovad = &domain->consistent_iovad; 501 500 unsigned long limit = domain->iova_limit; 502 501 dma_addr_t iova = vduse_domain_alloc_iova(iovad, size, limit); 503 - void *orig = alloc_pages_exact(size, flag); 504 502 505 - if (!iova || !orig) 506 - goto err; 503 + if (!iova) 504 + return DMA_MAPPING_ERROR; 507 505 508 506 spin_lock(&domain->iotlb_lock); 509 507 if (vduse_iotlb_add_range(domain, (u64)iova, (u64)iova + size - 1, ··· 512 514 } 513 515 spin_unlock(&domain->iotlb_lock); 514 516 515 - *dma_addr = iova; 517 + return iova; 516 518 517 - return orig; 518 519 err: 519 - *dma_addr = DMA_MAPPING_ERROR; 520 - if (orig) 521 - free_pages_exact(orig, size); 522 - if (iova) 523 - vduse_domain_free_iova(iovad, iova, size); 520 + vduse_domain_free_iova(iovad, iova, size); 524 521 525 - return NULL; 522 + return DMA_MAPPING_ERROR; 526 523 } 527 524 528 525 void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size, 529 - void *vaddr, dma_addr_t dma_addr, 530 - unsigned long attrs) 526 + dma_addr_t dma_addr, unsigned long attrs) 531 527 { 532 528 struct iova_domain *iovad = &domain->consistent_iovad; 533 529 struct vhost_iotlb_map *map; 534 530 struct vdpa_map_file *map_file; 535 - phys_addr_t pa; 536 531 537 532 spin_lock(&domain->iotlb_lock); 538 533 map = vhost_iotlb_itree_first(domain->iotlb, (u64)dma_addr, ··· 537 546 map_file = (struct vdpa_map_file *)map->opaque; 538 547 fput(map_file->file); 539 548 kfree(map_file); 540 - pa = map->addr; 541 549 vhost_iotlb_map_free(domain->iotlb, map); 542 550 spin_unlock(&domain->iotlb_lock); 543 551 544 552 vduse_domain_free_iova(iovad, dma_addr, size); 545 - free_pages_exact(phys_to_virt(pa), size); 546 553 } 547 554 548 555 static vm_fault_t vduse_domain_mmap_fault(struct vm_fault *vmf)
+3 -5
drivers/vdpa/vdpa_user/iova_domain.h
··· 65 65 dma_addr_t dma_addr, size_t size, 66 66 enum dma_data_direction dir, unsigned long attrs); 67 67 68 - void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain, 69 - size_t size, dma_addr_t *dma_addr, 70 - gfp_t flag); 68 + dma_addr_t vduse_domain_alloc_coherent(struct vduse_iova_domain *domain, 69 + size_t size, void *orig); 71 70 72 71 void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size, 73 - void *vaddr, dma_addr_t dma_addr, 74 - unsigned long attrs); 72 + dma_addr_t dma_addr, unsigned long attrs); 75 73 76 74 void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain); 77 75
+392 -132
drivers/vdpa/vdpa_user/vduse_dev.c
··· 9 9 */ 10 10 11 11 #include "linux/virtio_net.h" 12 + #include <linux/cleanup.h> 12 13 #include <linux/init.h> 13 14 #include <linux/module.h> 14 15 #include <linux/cdev.h> ··· 23 22 #include <linux/uio.h> 24 23 #include <linux/vdpa.h> 25 24 #include <linux/nospec.h> 25 + #include <linux/virtio.h> 26 26 #include <linux/vmalloc.h> 27 27 #include <linux/sched/mm.h> 28 28 #include <uapi/linux/vduse.h> ··· 41 39 #define DRV_LICENSE "GPL v2" 42 40 43 41 #define VDUSE_DEV_MAX (1U << MINORBITS) 42 + #define VDUSE_DEV_MAX_GROUPS 0xffff 43 + #define VDUSE_DEV_MAX_AS 0xffff 44 44 #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024) 45 45 #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024) 46 46 #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024) ··· 51 47 #define VDUSE_MSG_DEFAULT_TIMEOUT 30 52 48 53 49 #define IRQ_UNBOUND -1 50 + 51 + /* 52 + * VDUSE instance have not asked the vduse API version, so assume 0. 53 + * 54 + * Old devices may not ask for the device version and assume it is 0. Keep 55 + * this value for these. From the moment the VDUSE instance ask for the 56 + * version, convert to the latests supported one and continue regular flow 57 + */ 58 + #define VDUSE_API_VERSION_NOT_ASKED U64_MAX 54 59 55 60 struct vduse_virtqueue { 56 61 u16 index; ··· 71 58 struct vdpa_vq_state state; 72 59 bool ready; 73 60 bool kicked; 61 + u32 group; 74 62 spinlock_t kick_lock; 75 63 spinlock_t irq_lock; 76 64 struct eventfd_ctx *kickfd; ··· 97 83 struct mm_struct *mm; 98 84 }; 99 85 86 + struct vduse_as { 87 + struct vduse_iova_domain *domain; 88 + struct vduse_umem *umem; 89 + struct mutex mem_lock; 90 + }; 91 + 92 + struct vduse_vq_group { 93 + rwlock_t as_lock; 94 + struct vduse_as *as; /* Protected by as_lock */ 95 + struct vduse_dev *dev; 96 + }; 97 + 100 98 struct vduse_dev { 101 99 struct vduse_vdpa *vdev; 102 100 struct device *dev; 103 101 struct vduse_virtqueue **vqs; 104 - struct vduse_iova_domain *domain; 102 + struct vduse_as *as; 105 103 char *name; 106 104 struct mutex lock; 107 105 spinlock_t msg_lock; ··· 140 114 u8 status; 141 115 u32 vq_num; 142 116 u32 vq_align; 143 - struct vduse_umem *umem; 144 - struct mutex mem_lock; 117 + u32 ngroups; 118 + u32 nas; 119 + struct vduse_vq_group *groups; 145 120 unsigned int bounce_size; 146 121 struct mutex domain_lock; 147 122 }; ··· 332 305 return vduse_dev_msg_sync(dev, &msg); 333 306 } 334 307 335 - static int vduse_dev_update_iotlb(struct vduse_dev *dev, 308 + static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid, 336 309 u64 start, u64 last) 337 310 { 338 311 struct vduse_dev_msg msg = { 0 }; ··· 341 314 return -EINVAL; 342 315 343 316 msg.req.type = VDUSE_UPDATE_IOTLB; 344 - msg.req.iova.start = start; 345 - msg.req.iova.last = last; 317 + if (dev->api_version < VDUSE_API_VERSION_1) { 318 + msg.req.iova.start = start; 319 + msg.req.iova.last = last; 320 + } else { 321 + msg.req.iova_v2.start = start; 322 + msg.req.iova_v2.last = last; 323 + msg.req.iova_v2.asid = asid; 324 + } 346 325 347 326 return vduse_dev_msg_sync(dev, &msg); 348 327 } ··· 463 430 static void vduse_dev_reset(struct vduse_dev *dev) 464 431 { 465 432 int i; 466 - struct vduse_iova_domain *domain = dev->domain; 467 433 468 434 /* The coherent mappings are handled in vduse_dev_free_coherent() */ 469 - if (domain && domain->bounce_map) 470 - vduse_domain_reset_bounce_map(domain); 435 + for (i = 0; i < dev->nas; i++) { 436 + struct vduse_iova_domain *domain = dev->as[i].domain; 437 + 438 + if (domain && domain->bounce_map) 439 + vduse_domain_reset_bounce_map(domain); 440 + } 471 441 472 442 down_write(&dev->rwsem); 473 443 ··· 624 588 vq->state.packed.last_used_idx = state->packed.last_used_idx; 625 589 } else 626 590 vq->state.split.avail_index = state->split.avail_index; 591 + 592 + return 0; 593 + } 594 + 595 + static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx) 596 + { 597 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 598 + 599 + if (dev->api_version < VDUSE_API_VERSION_1) 600 + return 0; 601 + 602 + return dev->vqs[idx]->group; 603 + } 604 + 605 + static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx) 606 + { 607 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 608 + u32 vq_group = vduse_get_vq_group(vdpa, idx); 609 + union virtio_map ret = { 610 + .group = &dev->groups[vq_group], 611 + }; 612 + 613 + return ret; 614 + } 615 + 616 + DEFINE_GUARD(vq_group_as_read_lock, struct vduse_vq_group *, 617 + if (_T->dev->nas > 1) 618 + read_lock(&_T->as_lock), 619 + if (_T->dev->nas > 1) 620 + read_unlock(&_T->as_lock)) 621 + 622 + DEFINE_GUARD(vq_group_as_write_lock, struct vduse_vq_group *, 623 + if (_T->dev->nas > 1) 624 + write_lock(&_T->as_lock), 625 + if (_T->dev->nas > 1) 626 + write_unlock(&_T->as_lock)) 627 + 628 + static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group, 629 + unsigned int asid) 630 + { 631 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 632 + struct vduse_dev_msg msg = { 0 }; 633 + int r; 634 + 635 + if (dev->api_version < VDUSE_API_VERSION_1) 636 + return -EINVAL; 637 + 638 + msg.req.type = VDUSE_SET_VQ_GROUP_ASID; 639 + msg.req.vq_group_asid.group = group; 640 + msg.req.vq_group_asid.asid = asid; 641 + 642 + r = vduse_dev_msg_sync(dev, &msg); 643 + if (r < 0) 644 + return r; 645 + 646 + guard(vq_group_as_write_lock)(&dev->groups[group]); 647 + dev->groups[group].as = &dev->as[asid]; 627 648 628 649 return 0; 629 650 } ··· 856 763 struct vduse_dev *dev = vdpa_to_vduse(vdpa); 857 764 int ret; 858 765 859 - ret = vduse_domain_set_map(dev->domain, iotlb); 766 + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb); 860 767 if (ret) 861 768 return ret; 862 769 863 - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX); 770 + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX); 864 771 if (ret) { 865 - vduse_domain_clear_map(dev->domain, iotlb); 772 + vduse_domain_clear_map(dev->as[asid].domain, iotlb); 866 773 return ret; 867 774 } 868 775 ··· 882 789 .set_vq_cb = vduse_vdpa_set_vq_cb, 883 790 .set_vq_num = vduse_vdpa_set_vq_num, 884 791 .get_vq_size = vduse_vdpa_get_vq_size, 792 + .get_vq_group = vduse_get_vq_group, 885 793 .set_vq_ready = vduse_vdpa_set_vq_ready, 886 794 .get_vq_ready = vduse_vdpa_get_vq_ready, 887 795 .set_vq_state = vduse_vdpa_set_vq_state, ··· 905 811 .get_vq_affinity = vduse_vdpa_get_vq_affinity, 906 812 .reset = vduse_vdpa_reset, 907 813 .set_map = vduse_vdpa_set_map, 814 + .set_group_asid = vduse_set_group_asid, 815 + .get_vq_map = vduse_get_vq_map, 908 816 .free = vduse_vdpa_free, 909 817 }; 910 818 ··· 914 818 dma_addr_t dma_addr, size_t size, 915 819 enum dma_data_direction dir) 916 820 { 917 - struct vduse_iova_domain *domain = token.iova_domain; 821 + struct vduse_iova_domain *domain; 918 822 823 + if (!token.group) 824 + return; 825 + 826 + guard(vq_group_as_read_lock)(token.group); 827 + domain = token.group->as->domain; 919 828 vduse_domain_sync_single_for_device(domain, dma_addr, size, dir); 920 829 } 921 830 ··· 928 827 dma_addr_t dma_addr, size_t size, 929 828 enum dma_data_direction dir) 930 829 { 931 - struct vduse_iova_domain *domain = token.iova_domain; 830 + struct vduse_iova_domain *domain; 932 831 832 + if (!token.group) 833 + return; 834 + 835 + guard(vq_group_as_read_lock)(token.group); 836 + domain = token.group->as->domain; 933 837 vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir); 934 838 } 935 839 ··· 943 837 enum dma_data_direction dir, 944 838 unsigned long attrs) 945 839 { 946 - struct vduse_iova_domain *domain = token.iova_domain; 840 + struct vduse_iova_domain *domain; 947 841 842 + if (!token.group) 843 + return DMA_MAPPING_ERROR; 844 + 845 + guard(vq_group_as_read_lock)(token.group); 846 + domain = token.group->as->domain; 948 847 return vduse_domain_map_page(domain, page, offset, size, dir, attrs); 949 848 } 950 849 ··· 957 846 size_t size, enum dma_data_direction dir, 958 847 unsigned long attrs) 959 848 { 960 - struct vduse_iova_domain *domain = token.iova_domain; 849 + struct vduse_iova_domain *domain; 961 850 962 - return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs); 851 + if (!token.group) 852 + return; 853 + 854 + guard(vq_group_as_read_lock)(token.group); 855 + domain = token.group->as->domain; 856 + vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs); 963 857 } 964 858 965 859 static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size, 966 860 dma_addr_t *dma_addr, gfp_t flag) 967 861 { 968 - struct vduse_iova_domain *domain = token.iova_domain; 969 - unsigned long iova; 970 862 void *addr; 971 863 972 864 *dma_addr = DMA_MAPPING_ERROR; 973 - addr = vduse_domain_alloc_coherent(domain, size, 974 - (dma_addr_t *)&iova, flag); 865 + if (!token.group) 866 + return NULL; 867 + 868 + addr = alloc_pages_exact(size, flag); 975 869 if (!addr) 976 870 return NULL; 977 871 978 - *dma_addr = (dma_addr_t)iova; 872 + { 873 + struct vduse_iova_domain *domain; 874 + 875 + guard(vq_group_as_read_lock)(token.group); 876 + domain = token.group->as->domain; 877 + *dma_addr = vduse_domain_alloc_coherent(domain, size, addr); 878 + if (*dma_addr == DMA_MAPPING_ERROR) 879 + goto err; 880 + } 979 881 980 882 return addr; 883 + 884 + err: 885 + free_pages_exact(addr, size); 886 + return NULL; 981 887 } 982 888 983 889 static void vduse_dev_free_coherent(union virtio_map token, size_t size, 984 890 void *vaddr, dma_addr_t dma_addr, 985 891 unsigned long attrs) 986 892 { 987 - struct vduse_iova_domain *domain = token.iova_domain; 893 + if (!token.group) 894 + return; 988 895 989 - vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs); 896 + { 897 + struct vduse_iova_domain *domain; 898 + 899 + guard(vq_group_as_read_lock)(token.group); 900 + domain = token.group->as->domain; 901 + vduse_domain_free_coherent(domain, size, dma_addr, attrs); 902 + } 903 + 904 + free_pages_exact(vaddr, size); 990 905 } 991 906 992 907 static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr) 993 908 { 994 - struct vduse_iova_domain *domain = token.iova_domain; 909 + if (!token.group) 910 + return false; 995 911 996 - return dma_addr < domain->bounce_size; 912 + guard(vq_group_as_read_lock)(token.group); 913 + return dma_addr < token.group->as->domain->bounce_size; 997 914 } 998 915 999 916 static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr) ··· 1033 894 1034 895 static size_t vduse_dev_max_mapping_size(union virtio_map token) 1035 896 { 1036 - struct vduse_iova_domain *domain = token.iova_domain; 897 + if (!token.group) 898 + return 0; 1037 899 1038 - return domain->bounce_size; 900 + guard(vq_group_as_read_lock)(token.group); 901 + return token.group->as->domain->bounce_size; 1039 902 } 1040 903 1041 904 static const struct virtio_map_ops vduse_map_ops = { ··· 1177 1036 return ret; 1178 1037 } 1179 1038 1180 - static int vduse_dev_dereg_umem(struct vduse_dev *dev, 1039 + static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid, 1181 1040 u64 iova, u64 size) 1182 1041 { 1183 1042 int ret; 1184 1043 1185 - mutex_lock(&dev->mem_lock); 1044 + mutex_lock(&dev->as[asid].mem_lock); 1186 1045 ret = -ENOENT; 1187 - if (!dev->umem) 1046 + if (!dev->as[asid].umem) 1188 1047 goto unlock; 1189 1048 1190 1049 ret = -EINVAL; 1191 - if (!dev->domain) 1050 + if (!dev->as[asid].domain) 1192 1051 goto unlock; 1193 1052 1194 - if (dev->umem->iova != iova || size != dev->domain->bounce_size) 1053 + if (dev->as[asid].umem->iova != iova || 1054 + size != dev->as[asid].domain->bounce_size) 1195 1055 goto unlock; 1196 1056 1197 - vduse_domain_remove_user_bounce_pages(dev->domain); 1198 - unpin_user_pages_dirty_lock(dev->umem->pages, 1199 - dev->umem->npages, true); 1200 - atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm); 1201 - mmdrop(dev->umem->mm); 1202 - vfree(dev->umem->pages); 1203 - kfree(dev->umem); 1204 - dev->umem = NULL; 1057 + vduse_domain_remove_user_bounce_pages(dev->as[asid].domain); 1058 + unpin_user_pages_dirty_lock(dev->as[asid].umem->pages, 1059 + dev->as[asid].umem->npages, true); 1060 + atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm); 1061 + mmdrop(dev->as[asid].umem->mm); 1062 + vfree(dev->as[asid].umem->pages); 1063 + kfree(dev->as[asid].umem); 1064 + dev->as[asid].umem = NULL; 1205 1065 ret = 0; 1206 1066 unlock: 1207 - mutex_unlock(&dev->mem_lock); 1067 + mutex_unlock(&dev->as[asid].mem_lock); 1208 1068 return ret; 1209 1069 } 1210 1070 1211 1071 static int vduse_dev_reg_umem(struct vduse_dev *dev, 1212 - u64 iova, u64 uaddr, u64 size) 1072 + u32 asid, u64 iova, u64 uaddr, u64 size) 1213 1073 { 1214 1074 struct page **page_list = NULL; 1215 1075 struct vduse_umem *umem = NULL; ··· 1218 1076 unsigned long npages, lock_limit; 1219 1077 int ret; 1220 1078 1221 - if (!dev->domain || !dev->domain->bounce_map || 1222 - size != dev->domain->bounce_size || 1079 + if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map || 1080 + size != dev->as[asid].domain->bounce_size || 1223 1081 iova != 0 || uaddr & ~PAGE_MASK) 1224 1082 return -EINVAL; 1225 1083 1226 - mutex_lock(&dev->mem_lock); 1084 + mutex_lock(&dev->as[asid].mem_lock); 1227 1085 ret = -EEXIST; 1228 - if (dev->umem) 1086 + if (dev->as[asid].umem) 1229 1087 goto unlock; 1230 1088 1231 1089 ret = -ENOMEM; ··· 1249 1107 goto out; 1250 1108 } 1251 1109 1252 - ret = vduse_domain_add_user_bounce_pages(dev->domain, 1110 + ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain, 1253 1111 page_list, pinned); 1254 1112 if (ret) 1255 1113 goto out; ··· 1262 1120 umem->mm = current->mm; 1263 1121 mmgrab(current->mm); 1264 1122 1265 - dev->umem = umem; 1123 + dev->as[asid].umem = umem; 1266 1124 out: 1267 1125 if (ret && pinned > 0) 1268 1126 unpin_user_pages(page_list, pinned); ··· 1273 1131 vfree(page_list); 1274 1132 kfree(umem); 1275 1133 } 1276 - mutex_unlock(&dev->mem_lock); 1134 + mutex_unlock(&dev->as[asid].mem_lock); 1277 1135 return ret; 1278 1136 } 1279 1137 ··· 1293 1151 vq->irq_effective_cpu = curr_cpu; 1294 1152 } 1295 1153 1154 + static int vduse_dev_iotlb_entry(struct vduse_dev *dev, 1155 + struct vduse_iotlb_entry_v2 *entry, 1156 + struct file **f, uint64_t *capability) 1157 + { 1158 + u32 asid; 1159 + int r = -EINVAL; 1160 + struct vhost_iotlb_map *map; 1161 + 1162 + if (entry->start > entry->last || entry->asid >= dev->nas) 1163 + return -EINVAL; 1164 + 1165 + asid = array_index_nospec(entry->asid, dev->nas); 1166 + mutex_lock(&dev->domain_lock); 1167 + 1168 + if (!dev->as[asid].domain) 1169 + goto out; 1170 + 1171 + spin_lock(&dev->as[asid].domain->iotlb_lock); 1172 + map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb, 1173 + entry->start, entry->last); 1174 + if (map) { 1175 + if (f) { 1176 + const struct vdpa_map_file *map_file; 1177 + 1178 + map_file = (struct vdpa_map_file *)map->opaque; 1179 + entry->offset = map_file->offset; 1180 + *f = get_file(map_file->file); 1181 + } 1182 + entry->start = map->start; 1183 + entry->last = map->last; 1184 + entry->perm = map->perm; 1185 + if (capability) { 1186 + *capability = 0; 1187 + 1188 + if (dev->as[asid].domain->bounce_map && map->start == 0 && 1189 + map->last == dev->as[asid].domain->bounce_size - 1) 1190 + *capability |= VDUSE_IOVA_CAP_UMEM; 1191 + } 1192 + 1193 + r = 0; 1194 + } 1195 + spin_unlock(&dev->as[asid].domain->iotlb_lock); 1196 + 1197 + out: 1198 + mutex_unlock(&dev->domain_lock); 1199 + return r; 1200 + } 1201 + 1296 1202 static long vduse_dev_ioctl(struct file *file, unsigned int cmd, 1297 1203 unsigned long arg) 1298 1204 { ··· 1352 1162 return -EPERM; 1353 1163 1354 1164 switch (cmd) { 1355 - case VDUSE_IOTLB_GET_FD: { 1356 - struct vduse_iotlb_entry entry; 1357 - struct vhost_iotlb_map *map; 1358 - struct vdpa_map_file *map_file; 1165 + case VDUSE_IOTLB_GET_FD: 1166 + case VDUSE_IOTLB_GET_FD2: { 1167 + struct vduse_iotlb_entry_v2 entry = {0}; 1359 1168 struct file *f = NULL; 1360 1169 1170 + ret = -ENOIOCTLCMD; 1171 + if (dev->api_version < VDUSE_API_VERSION_1 && 1172 + cmd == VDUSE_IOTLB_GET_FD2) 1173 + break; 1174 + 1361 1175 ret = -EFAULT; 1362 - if (copy_from_user(&entry, argp, sizeof(entry))) 1176 + if (copy_from_user(&entry, argp, _IOC_SIZE(cmd))) 1363 1177 break; 1364 1178 1365 1179 ret = -EINVAL; 1366 - if (entry.start > entry.last) 1180 + if (!is_mem_zero((const char *)entry.reserved, 1181 + sizeof(entry.reserved))) 1367 1182 break; 1368 1183 1369 - mutex_lock(&dev->domain_lock); 1370 - if (!dev->domain) { 1371 - mutex_unlock(&dev->domain_lock); 1184 + ret = vduse_dev_iotlb_entry(dev, &entry, &f, NULL); 1185 + if (ret) 1372 1186 break; 1373 - } 1374 - spin_lock(&dev->domain->iotlb_lock); 1375 - map = vhost_iotlb_itree_first(dev->domain->iotlb, 1376 - entry.start, entry.last); 1377 - if (map) { 1378 - map_file = (struct vdpa_map_file *)map->opaque; 1379 - f = get_file(map_file->file); 1380 - entry.offset = map_file->offset; 1381 - entry.start = map->start; 1382 - entry.last = map->last; 1383 - entry.perm = map->perm; 1384 - } 1385 - spin_unlock(&dev->domain->iotlb_lock); 1386 - mutex_unlock(&dev->domain_lock); 1187 + 1387 1188 ret = -EINVAL; 1388 1189 if (!f) 1389 1190 break; 1390 1191 1391 - ret = -EFAULT; 1392 - if (copy_to_user(argp, &entry, sizeof(entry))) { 1192 + ret = copy_to_user(argp, &entry, _IOC_SIZE(cmd)); 1193 + if (ret) { 1194 + ret = -EFAULT; 1393 1195 fput(f); 1394 1196 break; 1395 1197 } ··· 1434 1252 if (config.index >= dev->vq_num) 1435 1253 break; 1436 1254 1437 - if (!is_mem_zero((const char *)config.reserved, 1438 - sizeof(config.reserved))) 1255 + if (dev->api_version < VDUSE_API_VERSION_1) { 1256 + if (config.group) 1257 + break; 1258 + } else { 1259 + if (config.group >= dev->ngroups) 1260 + break; 1261 + if (dev->status & VIRTIO_CONFIG_S_DRIVER_OK) 1262 + break; 1263 + } 1264 + 1265 + if (config.reserved1 || 1266 + !is_mem_zero((const char *)config.reserved2, 1267 + sizeof(config.reserved2))) 1439 1268 break; 1440 1269 1441 1270 index = array_index_nospec(config.index, dev->vq_num); 1442 1271 dev->vqs[index]->num_max = config.max_size; 1272 + dev->vqs[index]->group = config.group; 1443 1273 ret = 0; 1444 1274 break; 1445 1275 } ··· 1530 1336 } 1531 1337 case VDUSE_IOTLB_REG_UMEM: { 1532 1338 struct vduse_iova_umem umem; 1339 + u32 asid; 1533 1340 1534 1341 ret = -EFAULT; 1535 1342 if (copy_from_user(&umem, argp, sizeof(umem))) ··· 1538 1343 1539 1344 ret = -EINVAL; 1540 1345 if (!is_mem_zero((const char *)umem.reserved, 1541 - sizeof(umem.reserved))) 1346 + sizeof(umem.reserved)) || 1347 + (dev->api_version < VDUSE_API_VERSION_1 && 1348 + umem.asid != 0) || umem.asid >= dev->nas) 1542 1349 break; 1543 1350 1544 1351 mutex_lock(&dev->domain_lock); 1545 - ret = vduse_dev_reg_umem(dev, umem.iova, 1352 + asid = array_index_nospec(umem.asid, dev->nas); 1353 + ret = vduse_dev_reg_umem(dev, asid, umem.iova, 1546 1354 umem.uaddr, umem.size); 1547 1355 mutex_unlock(&dev->domain_lock); 1548 1356 break; 1549 1357 } 1550 1358 case VDUSE_IOTLB_DEREG_UMEM: { 1551 1359 struct vduse_iova_umem umem; 1360 + u32 asid; 1552 1361 1553 1362 ret = -EFAULT; 1554 1363 if (copy_from_user(&umem, argp, sizeof(umem))) ··· 1560 1361 1561 1362 ret = -EINVAL; 1562 1363 if (!is_mem_zero((const char *)umem.reserved, 1563 - sizeof(umem.reserved))) 1364 + sizeof(umem.reserved)) || 1365 + (dev->api_version < VDUSE_API_VERSION_1 && 1366 + umem.asid != 0) || 1367 + umem.asid >= dev->nas) 1564 1368 break; 1369 + 1565 1370 mutex_lock(&dev->domain_lock); 1566 - ret = vduse_dev_dereg_umem(dev, umem.iova, 1371 + asid = array_index_nospec(umem.asid, dev->nas); 1372 + ret = vduse_dev_dereg_umem(dev, asid, umem.iova, 1567 1373 umem.size); 1568 1374 mutex_unlock(&dev->domain_lock); 1569 1375 break; 1570 1376 } 1571 1377 case VDUSE_IOTLB_GET_INFO: { 1572 1378 struct vduse_iova_info info; 1573 - struct vhost_iotlb_map *map; 1379 + struct vduse_iotlb_entry_v2 entry; 1574 1380 1575 1381 ret = -EFAULT; 1576 1382 if (copy_from_user(&info, argp, sizeof(info))) 1577 - break; 1578 - 1579 - ret = -EINVAL; 1580 - if (info.start > info.last) 1581 1383 break; 1582 1384 1583 1385 if (!is_mem_zero((const char *)info.reserved, 1584 1386 sizeof(info.reserved))) 1585 1387 break; 1586 1388 1587 - mutex_lock(&dev->domain_lock); 1588 - if (!dev->domain) { 1589 - mutex_unlock(&dev->domain_lock); 1389 + if (dev->api_version < VDUSE_API_VERSION_1) { 1390 + if (info.asid) 1391 + break; 1392 + } else if (info.asid >= dev->nas) 1590 1393 break; 1591 - } 1592 - spin_lock(&dev->domain->iotlb_lock); 1593 - map = vhost_iotlb_itree_first(dev->domain->iotlb, 1594 - info.start, info.last); 1595 - if (map) { 1596 - info.start = map->start; 1597 - info.last = map->last; 1598 - info.capability = 0; 1599 - if (dev->domain->bounce_map && map->start == 0 && 1600 - map->last == dev->domain->bounce_size - 1) 1601 - info.capability |= VDUSE_IOVA_CAP_UMEM; 1602 - } 1603 - spin_unlock(&dev->domain->iotlb_lock); 1604 - mutex_unlock(&dev->domain_lock); 1605 - if (!map) 1394 + 1395 + entry.start = info.start; 1396 + entry.last = info.last; 1397 + entry.asid = info.asid; 1398 + ret = vduse_dev_iotlb_entry(dev, &entry, NULL, 1399 + &info.capability); 1400 + if (ret < 0) 1606 1401 break; 1402 + 1403 + info.start = entry.start; 1404 + info.last = entry.last; 1405 + info.asid = entry.asid; 1607 1406 1608 1407 ret = -EFAULT; 1609 1408 if (copy_to_user(argp, &info, sizeof(info))) ··· 1623 1426 struct vduse_dev *dev = file->private_data; 1624 1427 1625 1428 mutex_lock(&dev->domain_lock); 1626 - if (dev->domain) 1627 - vduse_dev_dereg_umem(dev, 0, dev->domain->bounce_size); 1429 + for (int i = 0; i < dev->nas; i++) 1430 + if (dev->as[i].domain) 1431 + vduse_dev_dereg_umem(dev, i, 0, 1432 + dev->as[i].domain->bounce_size); 1628 1433 mutex_unlock(&dev->domain_lock); 1629 1434 spin_lock(&dev->msg_lock); 1630 1435 /* Make sure the inflight messages can processed after reconncection */ ··· 1845 1646 return NULL; 1846 1647 1847 1648 mutex_init(&dev->lock); 1848 - mutex_init(&dev->mem_lock); 1849 1649 mutex_init(&dev->domain_lock); 1850 1650 spin_lock_init(&dev->msg_lock); 1851 1651 INIT_LIST_HEAD(&dev->send_list); ··· 1895 1697 idr_remove(&vduse_idr, dev->minor); 1896 1698 kvfree(dev->config); 1897 1699 vduse_dev_deinit_vqs(dev); 1898 - if (dev->domain) 1899 - vduse_domain_destroy(dev->domain); 1700 + for (int i = 0; i < dev->nas; i++) { 1701 + if (dev->as[i].domain) 1702 + vduse_domain_destroy(dev->as[i].domain); 1703 + } 1704 + kfree(dev->as); 1900 1705 kfree(dev->name); 1706 + kfree(dev->groups); 1901 1707 vduse_dev_destroy(dev); 1902 1708 module_put(THIS_MODULE); 1903 1709 ··· 1939 1737 return true; 1940 1738 } 1941 1739 1942 - static bool vduse_validate_config(struct vduse_dev_config *config) 1740 + static bool vduse_validate_config(struct vduse_dev_config *config, 1741 + u64 api_version) 1943 1742 { 1944 1743 if (!is_mem_zero((const char *)config->reserved, 1945 1744 sizeof(config->reserved))) 1946 1745 return false; 1746 + 1747 + if (api_version < VDUSE_API_VERSION_1 && 1748 + (config->ngroups || config->nas)) 1749 + return false; 1750 + 1751 + if (api_version >= VDUSE_API_VERSION_1) { 1752 + if (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS) 1753 + return false; 1754 + 1755 + if (!config->nas || config->nas > VDUSE_DEV_MAX_AS) 1756 + return false; 1757 + } 1947 1758 1948 1759 if (config->vq_align > PAGE_SIZE) 1949 1760 return false; ··· 2021 1806 2022 1807 ret = -EPERM; 2023 1808 mutex_lock(&dev->domain_lock); 2024 - if (dev->domain) 1809 + /* Assuming that if the first domain is allocated, all are allocated */ 1810 + if (dev->as[0].domain) 2025 1811 goto unlock; 2026 1812 2027 1813 ret = kstrtouint(buf, 10, &bounce_size); ··· 2074 1858 dev->device_features = config->features; 2075 1859 dev->device_id = config->device_id; 2076 1860 dev->vendor_id = config->vendor_id; 1861 + 1862 + dev->nas = (dev->api_version < VDUSE_API_VERSION_1) ? 1 : config->nas; 1863 + dev->as = kcalloc(dev->nas, sizeof(dev->as[0]), GFP_KERNEL); 1864 + if (!dev->as) 1865 + goto err_as; 1866 + for (int i = 0; i < dev->nas; i++) 1867 + mutex_init(&dev->as[i].mem_lock); 1868 + 1869 + dev->ngroups = (dev->api_version < VDUSE_API_VERSION_1) 1870 + ? 1 1871 + : config->ngroups; 1872 + dev->groups = kcalloc(dev->ngroups, sizeof(dev->groups[0]), 1873 + GFP_KERNEL); 1874 + if (!dev->groups) 1875 + goto err_vq_groups; 1876 + for (u32 i = 0; i < dev->ngroups; ++i) { 1877 + dev->groups[i].dev = dev; 1878 + rwlock_init(&dev->groups[i].as_lock); 1879 + dev->groups[i].as = &dev->as[0]; 1880 + } 1881 + 2077 1882 dev->name = kstrdup(config->name, GFP_KERNEL); 2078 1883 if (!dev->name) 2079 1884 goto err_str; ··· 2131 1894 err_idr: 2132 1895 kfree(dev->name); 2133 1896 err_str: 1897 + kfree(dev->groups); 1898 + err_vq_groups: 1899 + kfree(dev->as); 1900 + err_as: 2134 1901 vduse_dev_destroy(dev); 2135 1902 err: 2136 1903 return ret; ··· 2150 1909 mutex_lock(&vduse_lock); 2151 1910 switch (cmd) { 2152 1911 case VDUSE_GET_API_VERSION: 1912 + if (control->api_version == VDUSE_API_VERSION_NOT_ASKED) 1913 + control->api_version = VDUSE_API_VERSION_1; 2153 1914 ret = put_user(control->api_version, (u64 __user *)argp); 2154 1915 break; 2155 1916 case VDUSE_SET_API_VERSION: { ··· 2162 1919 break; 2163 1920 2164 1921 ret = -EINVAL; 2165 - if (api_version > VDUSE_API_VERSION) 1922 + if (api_version > VDUSE_API_VERSION_1) 2166 1923 break; 2167 1924 2168 1925 ret = 0; ··· 2179 1936 break; 2180 1937 2181 1938 ret = -EINVAL; 2182 - if (vduse_validate_config(&config) == false) 1939 + if (control->api_version == VDUSE_API_VERSION_NOT_ASKED) 1940 + control->api_version = VDUSE_API_VERSION; 1941 + if (!vduse_validate_config(&config, control->api_version)) 2183 1942 break; 2184 1943 2185 1944 buf = vmemdup_user(argp + size, config.config_size); ··· 2231 1986 if (!control) 2232 1987 return -ENOMEM; 2233 1988 2234 - control->api_version = VDUSE_API_VERSION; 1989 + control->api_version = VDUSE_API_VERSION_NOT_ASKED; 2235 1990 file->private_data = control; 2236 1991 2237 1992 return 0; ··· 2262 2017 2263 2018 vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev, 2264 2019 &vduse_vdpa_config_ops, &vduse_map_ops, 2265 - 1, 1, name, true); 2020 + dev->ngroups, dev->nas, name, true); 2266 2021 if (IS_ERR(vdev)) 2267 2022 return PTR_ERR(vdev); 2268 2023 ··· 2277 2032 const struct vdpa_dev_set_config *config) 2278 2033 { 2279 2034 struct vduse_dev *dev; 2280 - int ret; 2035 + size_t domain_bounce_size; 2036 + int ret, i; 2281 2037 2282 2038 mutex_lock(&vduse_lock); 2283 2039 dev = vduse_find_dev(name); ··· 2292 2046 return ret; 2293 2047 2294 2048 mutex_lock(&dev->domain_lock); 2295 - if (!dev->domain) 2296 - dev->domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1, 2297 - dev->bounce_size); 2298 - mutex_unlock(&dev->domain_lock); 2299 - if (!dev->domain) { 2300 - put_device(&dev->vdev->vdpa.dev); 2301 - return -ENOMEM; 2049 + ret = 0; 2050 + 2051 + domain_bounce_size = dev->bounce_size / dev->nas; 2052 + for (i = 0; i < dev->nas; ++i) { 2053 + dev->as[i].domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1, 2054 + domain_bounce_size); 2055 + if (!dev->as[i].domain) { 2056 + ret = -ENOMEM; 2057 + goto err; 2058 + } 2302 2059 } 2303 2060 2304 - dev->vdev->vdpa.vmap.iova_domain = dev->domain; 2061 + mutex_unlock(&dev->domain_lock); 2062 + 2305 2063 ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num); 2306 - if (ret) { 2307 - put_device(&dev->vdev->vdpa.dev); 2308 - mutex_lock(&dev->domain_lock); 2309 - vduse_domain_destroy(dev->domain); 2310 - dev->domain = NULL; 2311 - mutex_unlock(&dev->domain_lock); 2312 - return ret; 2313 - } 2064 + if (ret) 2065 + goto err_register; 2314 2066 2315 2067 return 0; 2068 + 2069 + err_register: 2070 + mutex_lock(&dev->domain_lock); 2071 + 2072 + err: 2073 + for (int j = 0; j < i; j++) { 2074 + if (dev->as[j].domain) { 2075 + vduse_domain_destroy(dev->as[j].domain); 2076 + dev->as[j].domain = NULL; 2077 + } 2078 + } 2079 + mutex_unlock(&dev->domain_lock); 2080 + 2081 + put_device(&dev->vdev->vdpa.dev); 2082 + 2083 + return ret; 2316 2084 } 2317 2085 2318 2086 static void vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev)
+4 -1
drivers/vhost/vdpa.c
··· 680 680 case VHOST_VDPA_SET_GROUP_ASID: 681 681 if (copy_from_user(&s, argp, sizeof(s))) 682 682 return -EFAULT; 683 - if (s.num >= vdpa->nas) 683 + if (idx >= vdpa->ngroups || s.num >= vdpa->nas) 684 684 return -EINVAL; 685 + if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK) 686 + return -EBUSY; 685 687 if (!ops->set_group_asid) 686 688 return -EOPNOTSUPP; 687 689 return ops->set_group_asid(vdpa, idx, s.num); ··· 1529 1527 if (vma->vm_end - vma->vm_start != notify.size) 1530 1528 return -ENOTSUPP; 1531 1529 1530 + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); 1532 1531 vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); 1533 1532 vma->vm_ops = &vhost_vdpa_vm_ops; 1534 1533 return 0;
+4 -4
drivers/vhost/vhost.c
··· 1444 1444 ({ \ 1445 1445 int ret; \ 1446 1446 if (!vq->iotlb) { \ 1447 - ret = __put_user(x, ptr); \ 1447 + ret = put_user(x, ptr); \ 1448 1448 } else { \ 1449 1449 __typeof__(ptr) to = \ 1450 1450 (__typeof__(ptr)) __vhost_get_user(vq, ptr, \ 1451 1451 sizeof(*ptr), VHOST_ADDR_USED); \ 1452 1452 if (to != NULL) \ 1453 - ret = __put_user(x, to); \ 1453 + ret = put_user(x, to); \ 1454 1454 else \ 1455 1455 ret = -EFAULT; \ 1456 1456 } \ ··· 1489 1489 ({ \ 1490 1490 int ret; \ 1491 1491 if (!vq->iotlb) { \ 1492 - ret = __get_user(x, ptr); \ 1492 + ret = get_user(x, ptr); \ 1493 1493 } else { \ 1494 1494 __typeof__(ptr) from = \ 1495 1495 (__typeof__(ptr)) __vhost_get_user(vq, ptr, \ 1496 1496 sizeof(*ptr), \ 1497 1497 type); \ 1498 1498 if (from != NULL) \ 1499 - ret = __get_user(x, from); \ 1499 + ret = get_user(x, from); \ 1500 1500 else \ 1501 1501 ret = -EFAULT; \ 1502 1502 } \
+4 -1
drivers/virtio/virtio_input.c
··· 4 4 #include <linux/virtio_config.h> 5 5 #include <linux/input.h> 6 6 #include <linux/slab.h> 7 + #include <linux/dma-mapping.h> 7 8 8 9 #include <uapi/linux/virtio_ids.h> 9 10 #include <uapi/linux/virtio_input.h> ··· 17 16 char serial[64]; 18 17 char phys[64]; 19 18 struct virtqueue *evt, *sts; 19 + __dma_from_device_group_begin(); 20 20 struct virtio_input_event evts[64]; 21 + __dma_from_device_group_end(); 21 22 spinlock_t lock; 22 23 bool ready; 23 24 }; ··· 30 27 struct scatterlist sg[1]; 31 28 32 29 sg_init_one(sg, evtbuf, sizeof(*evtbuf)); 33 - virtqueue_add_inbuf(vi->evt, sg, 1, evtbuf, GFP_ATOMIC); 30 + virtqueue_add_inbuf_cache_clean(vi->evt, sg, 1, evtbuf, GFP_ATOMIC); 34 31 } 35 32 36 33 static void virtinput_recv_events(struct virtqueue *vq)
+772 -238
drivers/virtio/virtio_ring.c
··· 67 67 #define LAST_ADD_TIME_INVALID(vq) 68 68 #endif 69 69 70 + enum vq_layout { 71 + VQ_LAYOUT_SPLIT = 0, 72 + VQ_LAYOUT_PACKED, 73 + VQ_LAYOUT_SPLIT_IN_ORDER, 74 + VQ_LAYOUT_PACKED_IN_ORDER, 75 + }; 76 + 70 77 struct vring_desc_state_split { 71 78 void *data; /* Data for callback. */ 72 79 ··· 81 74 * allocated together. So we won't stress more to the memory allocator. 82 75 */ 83 76 struct vring_desc *indir_desc; 77 + u32 total_in_len; 84 78 }; 85 79 86 80 struct vring_desc_state_packed { ··· 93 85 struct vring_packed_desc *indir_desc; 94 86 u16 num; /* Descriptor list length. */ 95 87 u16 last; /* The last desc state in a list. */ 88 + u32 total_in_len; /* In length for the skipped buffer. */ 96 89 }; 97 90 98 91 struct vring_desc_extra { ··· 168 159 size_t event_size_in_bytes; 169 160 }; 170 161 162 + struct vring_virtqueue; 163 + 164 + struct virtqueue_ops { 165 + int (*add)(struct vring_virtqueue *vq, struct scatterlist *sgs[], 166 + unsigned int total_sg, unsigned int out_sgs, 167 + unsigned int in_sgs, void *data, 168 + void *ctx, bool premapped, gfp_t gfp, 169 + unsigned long attr); 170 + void *(*get)(struct vring_virtqueue *vq, unsigned int *len, void **ctx); 171 + bool (*kick_prepare)(struct vring_virtqueue *vq); 172 + void (*disable_cb)(struct vring_virtqueue *vq); 173 + bool (*enable_cb_delayed)(struct vring_virtqueue *vq); 174 + unsigned int (*enable_cb_prepare)(struct vring_virtqueue *vq); 175 + bool (*poll)(const struct vring_virtqueue *vq, 176 + unsigned int last_used_idx); 177 + void *(*detach_unused_buf)(struct vring_virtqueue *vq); 178 + bool (*more_used)(const struct vring_virtqueue *vq); 179 + int (*resize)(struct vring_virtqueue *vq, u32 num); 180 + void (*reset)(struct vring_virtqueue *vq); 181 + }; 182 + 171 183 struct vring_virtqueue { 172 184 struct virtqueue vq; 173 - 174 - /* Is this a packed ring? */ 175 - bool packed_ring; 176 185 177 186 /* Is DMA API used? */ 178 187 bool use_map_api; ··· 207 180 /* Host publishes avail event idx */ 208 181 bool event; 209 182 210 - /* Head of free buffer list. */ 183 + enum vq_layout layout; 184 + 185 + /* 186 + * Without IN_ORDER it's the head of free buffer list. With 187 + * IN_ORDER and SPLIT, it's the next available buffer 188 + * index. With IN_ORDER and PACKED, it's unused. 189 + */ 211 190 unsigned int free_head; 191 + 192 + /* 193 + * With IN_ORDER, once we see an in-order batch, this stores 194 + * this last entry, and until we return the last buffer. 195 + * After this, id is set to UINT_MAX to mark it invalid. 196 + * Unused without IN_ORDER. 197 + */ 198 + struct used_entry { 199 + u32 id; 200 + u32 len; 201 + } batch_last; 202 + 212 203 /* Number we've added since last sync. */ 213 204 unsigned int num_added; 214 205 ··· 237 192 * bits from VRING_PACKED_EVENT_F_WRAP_CTR include the used wrap counter. 238 193 */ 239 194 u16 last_used_idx; 195 + 196 + /* With IN_ORDER and SPLIT, last descriptor id we used to 197 + * detach buffer. 198 + */ 199 + u16 last_used; 240 200 241 201 /* Hint for event idx: already triggered no need to disable. */ 242 202 bool event_triggered; ··· 280 230 */ 281 231 282 232 #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq) 233 + 234 + 235 + static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq) 236 + { 237 + return vq->layout == VQ_LAYOUT_PACKED || 238 + vq->layout == VQ_LAYOUT_PACKED_IN_ORDER; 239 + } 240 + 241 + static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq) 242 + { 243 + return vq->layout == VQ_LAYOUT_SPLIT_IN_ORDER || 244 + vq->layout == VQ_LAYOUT_PACKED_IN_ORDER; 245 + } 283 246 284 247 static bool virtqueue_use_indirect(const struct vring_virtqueue *vq, 285 248 unsigned int total_sg) ··· 445 382 /* Map one sg entry. */ 446 383 static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg, 447 384 enum dma_data_direction direction, dma_addr_t *addr, 448 - u32 *len, bool premapped) 385 + u32 *len, bool premapped, unsigned long attr) 449 386 { 450 387 if (premapped) { 451 388 *addr = sg_dma_address(sg); ··· 473 410 */ 474 411 *addr = virtqueue_map_page_attrs(&vq->vq, sg_page(sg), 475 412 sg->offset, sg->length, 476 - direction, 0); 413 + direction, attr); 477 414 478 415 if (vring_mapping_error(vq, *addr)) 479 416 return -ENOMEM; ··· 496 433 { 497 434 vq->vq.num_free = num; 498 435 499 - if (vq->packed_ring) 436 + if (virtqueue_is_packed(vq)) 500 437 vq->last_used_idx = 0 | (1 << VRING_PACKED_EVENT_F_WRAP_CTR); 501 438 else 502 439 vq->last_used_idx = 0; 440 + 441 + vq->last_used = 0; 503 442 504 443 vq->event_triggered = false; 505 444 vq->num_added = 0; ··· 541 476 return extra->next; 542 477 } 543 478 544 - static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq, 479 + static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq, 545 480 unsigned int total_sg, 546 481 gfp_t gfp) 547 482 { ··· 570 505 return desc; 571 506 } 572 507 573 - static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq, 508 + static inline unsigned int virtqueue_add_desc_split(struct vring_virtqueue *vq, 574 509 struct vring_desc *desc, 575 510 struct vring_desc_extra *extra, 576 511 unsigned int i, ··· 578 513 unsigned int len, 579 514 u16 flags, bool premapped) 580 515 { 516 + struct virtio_device *vdev = vq->vq.vdev; 581 517 u16 next; 582 518 583 - desc[i].flags = cpu_to_virtio16(vq->vdev, flags); 584 - desc[i].addr = cpu_to_virtio64(vq->vdev, addr); 585 - desc[i].len = cpu_to_virtio32(vq->vdev, len); 519 + desc[i].flags = cpu_to_virtio16(vdev, flags); 520 + desc[i].addr = cpu_to_virtio64(vdev, addr); 521 + desc[i].len = cpu_to_virtio32(vdev, len); 586 522 587 523 extra[i].addr = premapped ? DMA_MAPPING_ERROR : addr; 588 524 extra[i].len = len; ··· 591 525 592 526 next = extra[i].next; 593 527 594 - desc[i].next = cpu_to_virtio16(vq->vdev, next); 528 + desc[i].next = cpu_to_virtio16(vdev, next); 595 529 596 530 return next; 597 531 } 598 532 599 - static inline int virtqueue_add_split(struct virtqueue *_vq, 533 + static inline int virtqueue_add_split(struct vring_virtqueue *vq, 600 534 struct scatterlist *sgs[], 601 535 unsigned int total_sg, 602 536 unsigned int out_sgs, ··· 604 538 void *data, 605 539 void *ctx, 606 540 bool premapped, 607 - gfp_t gfp) 541 + gfp_t gfp, 542 + unsigned long attr) 608 543 { 609 - struct vring_virtqueue *vq = to_vvq(_vq); 610 544 struct vring_desc_extra *extra; 611 545 struct scatterlist *sg; 612 546 struct vring_desc *desc; 613 - unsigned int i, n, avail, descs_used, prev, err_idx; 547 + unsigned int i, n, avail, descs_used, err_idx, sg_count = 0; 548 + /* Total length for in-order */ 549 + unsigned int total_in_len = 0; 614 550 int head; 615 551 bool indirect; 616 552 ··· 633 565 head = vq->free_head; 634 566 635 567 if (virtqueue_use_indirect(vq, total_sg)) 636 - desc = alloc_indirect_split(_vq, total_sg, gfp); 568 + desc = alloc_indirect_split(vq, total_sg, gfp); 637 569 else { 638 570 desc = NULL; 639 571 WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect); ··· 672 604 for (sg = sgs[n]; sg; sg = sg_next(sg)) { 673 605 dma_addr_t addr; 674 606 u32 len; 607 + u16 flags = 0; 675 608 676 - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr, &len, premapped)) 609 + if (++sg_count != total_sg) 610 + flags |= VRING_DESC_F_NEXT; 611 + 612 + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr, &len, 613 + premapped, attr)) 677 614 goto unmap_release; 678 615 679 - prev = i; 680 616 /* Note that we trust indirect descriptor 681 617 * table since it use stream DMA mapping. 682 618 */ 683 - i = virtqueue_add_desc_split(_vq, desc, extra, i, addr, len, 684 - VRING_DESC_F_NEXT, 685 - premapped); 619 + i = virtqueue_add_desc_split(vq, desc, extra, i, addr, 620 + len, flags, premapped); 686 621 } 687 622 } 688 623 for (; n < (out_sgs + in_sgs); n++) { 689 624 for (sg = sgs[n]; sg; sg = sg_next(sg)) { 690 625 dma_addr_t addr; 691 626 u32 len; 627 + u16 flags = VRING_DESC_F_WRITE; 692 628 693 - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr, &len, premapped)) 629 + if (++sg_count != total_sg) 630 + flags |= VRING_DESC_F_NEXT; 631 + 632 + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr, &len, 633 + premapped, attr)) 694 634 goto unmap_release; 695 635 696 - prev = i; 697 636 /* Note that we trust indirect descriptor 698 637 * table since it use stream DMA mapping. 699 638 */ 700 - i = virtqueue_add_desc_split(_vq, desc, extra, i, addr, len, 701 - VRING_DESC_F_NEXT | 702 - VRING_DESC_F_WRITE, 703 - premapped); 639 + i = virtqueue_add_desc_split(vq, desc, extra, i, addr, 640 + len, flags, premapped); 641 + total_in_len += len; 704 642 } 705 643 } 706 - /* Last one doesn't continue. */ 707 - desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT); 708 - if (!indirect && vring_need_unmap_buffer(vq, &extra[prev])) 709 - vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &= 710 - ~VRING_DESC_F_NEXT; 711 644 712 645 if (indirect) { 713 646 /* Now that the indirect table is filled in, map it. */ ··· 718 649 if (vring_mapping_error(vq, addr)) 719 650 goto unmap_release; 720 651 721 - virtqueue_add_desc_split(_vq, vq->split.vring.desc, 652 + virtqueue_add_desc_split(vq, vq->split.vring.desc, 722 653 vq->split.desc_extra, 723 654 head, addr, 724 655 total_sg * sizeof(struct vring_desc), ··· 729 660 vq->vq.num_free -= descs_used; 730 661 731 662 /* Update free pointer */ 732 - if (indirect) 663 + if (virtqueue_is_in_order(vq)) { 664 + vq->free_head += descs_used; 665 + if (vq->free_head >= vq->split.vring.num) 666 + vq->free_head -= vq->split.vring.num; 667 + vq->split.desc_state[head].total_in_len = total_in_len; 668 + } else if (indirect) 733 669 vq->free_head = vq->split.desc_extra[head].next; 734 670 else 735 671 vq->free_head = i; ··· 749 675 /* Put entry in available array (but don't update avail->idx until they 750 676 * do sync). */ 751 677 avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1); 752 - vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head); 678 + vq->split.vring.avail->ring[avail] = cpu_to_virtio16(vq->vq.vdev, head); 753 679 754 680 /* Descriptors and available array need to be set before we expose the 755 681 * new available array entries. */ 756 682 virtio_wmb(vq->weak_barriers); 757 683 vq->split.avail_idx_shadow++; 758 - vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, 684 + vq->split.vring.avail->idx = cpu_to_virtio16(vq->vq.vdev, 759 685 vq->split.avail_idx_shadow); 760 686 vq->num_added++; 761 687 ··· 765 691 /* This is very unlikely, but theoretically possible. Kick 766 692 * just in case. */ 767 693 if (unlikely(vq->num_added == (1 << 16) - 1)) 768 - virtqueue_kick(_vq); 694 + virtqueue_kick(&vq->vq); 769 695 770 696 return 0; 771 697 ··· 791 717 return -ENOMEM; 792 718 } 793 719 794 - static bool virtqueue_kick_prepare_split(struct virtqueue *_vq) 720 + static bool virtqueue_kick_prepare_split(struct vring_virtqueue *vq) 795 721 { 796 - struct vring_virtqueue *vq = to_vvq(_vq); 797 722 u16 new, old; 798 723 bool needs_kick; 799 724 ··· 809 736 LAST_ADD_TIME_INVALID(vq); 810 737 811 738 if (vq->event) { 812 - needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev, 739 + needs_kick = vring_need_event(virtio16_to_cpu(vq->vq.vdev, 813 740 vring_avail_event(&vq->split.vring)), 814 741 new, old); 815 742 } else { 816 743 needs_kick = !(vq->split.vring.used->flags & 817 - cpu_to_virtio16(_vq->vdev, 744 + cpu_to_virtio16(vq->vq.vdev, 818 745 VRING_USED_F_NO_NOTIFY)); 819 746 } 820 747 END_USE(vq); 821 748 return needs_kick; 822 749 } 823 750 824 - static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head, 825 - void **ctx) 751 + static void detach_indirect_split(struct vring_virtqueue *vq, 752 + unsigned int head) 753 + { 754 + struct vring_desc_extra *extra = vq->split.desc_extra; 755 + struct vring_desc *indir_desc = vq->split.desc_state[head].indir_desc; 756 + unsigned int j; 757 + u32 len, num; 758 + 759 + /* Free the indirect table, if any, now that it's unmapped. */ 760 + if (!indir_desc) 761 + return; 762 + len = vq->split.desc_extra[head].len; 763 + 764 + BUG_ON(!(vq->split.desc_extra[head].flags & 765 + VRING_DESC_F_INDIRECT)); 766 + BUG_ON(len == 0 || len % sizeof(struct vring_desc)); 767 + 768 + num = len / sizeof(struct vring_desc); 769 + 770 + extra = (struct vring_desc_extra *)&indir_desc[num]; 771 + 772 + if (vq->use_map_api) { 773 + for (j = 0; j < num; j++) 774 + vring_unmap_one_split(vq, &extra[j]); 775 + } 776 + 777 + kfree(indir_desc); 778 + vq->split.desc_state[head].indir_desc = NULL; 779 + } 780 + 781 + static unsigned detach_buf_split_in_order(struct vring_virtqueue *vq, 782 + unsigned int head, 783 + void **ctx) 826 784 { 827 785 struct vring_desc_extra *extra; 828 - unsigned int i, j; 786 + unsigned int i; 829 787 __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT); 830 788 831 789 /* Clear data ptr. */ ··· 868 764 i = head; 869 765 870 766 while (vq->split.vring.desc[i].flags & nextflag) { 871 - vring_unmap_one_split(vq, &extra[i]); 872 - i = vq->split.desc_extra[i].next; 767 + i = vring_unmap_one_split(vq, &extra[i]); 873 768 vq->vq.num_free++; 874 769 } 875 770 876 771 vring_unmap_one_split(vq, &extra[i]); 877 - vq->split.desc_extra[i].next = vq->free_head; 878 - vq->free_head = head; 879 772 880 773 /* Plus final descriptor */ 881 774 vq->vq.num_free++; 882 775 883 - if (vq->indirect) { 884 - struct vring_desc *indir_desc = 885 - vq->split.desc_state[head].indir_desc; 886 - u32 len, num; 887 - 888 - /* Free the indirect table, if any, now that it's unmapped. */ 889 - if (!indir_desc) 890 - return; 891 - len = vq->split.desc_extra[head].len; 892 - 893 - BUG_ON(!(vq->split.desc_extra[head].flags & 894 - VRING_DESC_F_INDIRECT)); 895 - BUG_ON(len == 0 || len % sizeof(struct vring_desc)); 896 - 897 - num = len / sizeof(struct vring_desc); 898 - 899 - extra = (struct vring_desc_extra *)&indir_desc[num]; 900 - 901 - if (vq->use_map_api) { 902 - for (j = 0; j < num; j++) 903 - vring_unmap_one_split(vq, &extra[j]); 904 - } 905 - 906 - kfree(indir_desc); 907 - vq->split.desc_state[head].indir_desc = NULL; 908 - } else if (ctx) { 776 + if (vq->indirect) 777 + detach_indirect_split(vq, head); 778 + else if (ctx) 909 779 *ctx = vq->split.desc_state[head].indir_desc; 910 - } 780 + 781 + return i; 782 + } 783 + 784 + static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head, 785 + void **ctx) 786 + { 787 + unsigned int i = detach_buf_split_in_order(vq, head, ctx); 788 + 789 + vq->split.desc_extra[i].next = vq->free_head; 790 + vq->free_head = head; 791 + } 792 + 793 + static bool virtqueue_poll_split(const struct vring_virtqueue *vq, 794 + unsigned int last_used_idx) 795 + { 796 + return (u16)last_used_idx != virtio16_to_cpu(vq->vq.vdev, 797 + vq->split.vring.used->idx); 911 798 } 912 799 913 800 static bool more_used_split(const struct vring_virtqueue *vq) 914 801 { 915 - return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, 916 - vq->split.vring.used->idx); 802 + return virtqueue_poll_split(vq, vq->last_used_idx); 917 803 } 918 804 919 - static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, 805 + static bool more_used_split_in_order(const struct vring_virtqueue *vq) 806 + { 807 + if (vq->batch_last.id != UINT_MAX) 808 + return true; 809 + 810 + return virtqueue_poll_split(vq, vq->last_used_idx); 811 + } 812 + 813 + static void *virtqueue_get_buf_ctx_split(struct vring_virtqueue *vq, 920 814 unsigned int *len, 921 815 void **ctx) 922 816 { 923 - struct vring_virtqueue *vq = to_vvq(_vq); 924 817 void *ret; 925 818 unsigned int i; 926 819 u16 last_used; ··· 939 838 virtio_rmb(vq->weak_barriers); 940 839 941 840 last_used = (vq->last_used_idx & (vq->split.vring.num - 1)); 942 - i = virtio32_to_cpu(_vq->vdev, 841 + i = virtio32_to_cpu(vq->vq.vdev, 943 842 vq->split.vring.used->ring[last_used].id); 944 - *len = virtio32_to_cpu(_vq->vdev, 843 + *len = virtio32_to_cpu(vq->vq.vdev, 945 844 vq->split.vring.used->ring[last_used].len); 946 845 947 846 if (unlikely(i >= vq->split.vring.num)) { ··· 963 862 if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) 964 863 virtio_store_mb(vq->weak_barriers, 965 864 &vring_used_event(&vq->split.vring), 966 - cpu_to_virtio16(_vq->vdev, vq->last_used_idx)); 865 + cpu_to_virtio16(vq->vq.vdev, vq->last_used_idx)); 967 866 968 867 LAST_ADD_TIME_INVALID(vq); 969 868 ··· 971 870 return ret; 972 871 } 973 872 974 - static void virtqueue_disable_cb_split(struct virtqueue *_vq) 873 + static void *virtqueue_get_buf_ctx_split_in_order(struct vring_virtqueue *vq, 874 + unsigned int *len, 875 + void **ctx) 975 876 { 976 - struct vring_virtqueue *vq = to_vvq(_vq); 877 + void *ret; 878 + unsigned int num = vq->split.vring.num; 879 + unsigned int num_free = vq->vq.num_free; 880 + u16 last_used, last_used_idx; 977 881 882 + START_USE(vq); 883 + 884 + if (unlikely(vq->broken)) { 885 + END_USE(vq); 886 + return NULL; 887 + } 888 + 889 + last_used = vq->last_used & (num - 1); 890 + last_used_idx = vq->last_used_idx & (num - 1); 891 + 892 + if (vq->batch_last.id == UINT_MAX) { 893 + if (!more_used_split_in_order(vq)) { 894 + pr_debug("No more buffers in queue\n"); 895 + END_USE(vq); 896 + return NULL; 897 + } 898 + 899 + /* 900 + * Only get used array entries after they have been 901 + * exposed by host. 902 + */ 903 + virtio_rmb(vq->weak_barriers); 904 + 905 + vq->batch_last.id = virtio32_to_cpu(vq->vq.vdev, 906 + vq->split.vring.used->ring[last_used_idx].id); 907 + vq->batch_last.len = virtio32_to_cpu(vq->vq.vdev, 908 + vq->split.vring.used->ring[last_used_idx].len); 909 + } 910 + 911 + if (vq->batch_last.id == last_used) { 912 + vq->batch_last.id = UINT_MAX; 913 + *len = vq->batch_last.len; 914 + } else { 915 + *len = vq->split.desc_state[last_used].total_in_len; 916 + } 917 + 918 + if (unlikely(!vq->split.desc_state[last_used].data)) { 919 + BAD_RING(vq, "id %u is not a head!\n", last_used); 920 + return NULL; 921 + } 922 + 923 + /* detach_buf_split clears data, so grab it now. */ 924 + ret = vq->split.desc_state[last_used].data; 925 + detach_buf_split_in_order(vq, last_used, ctx); 926 + 927 + vq->last_used_idx++; 928 + vq->last_used += (vq->vq.num_free - num_free); 929 + /* If we expect an interrupt for the next entry, tell host 930 + * by writing event index and flush out the write before 931 + * the read in the next get_buf call. */ 932 + if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) 933 + virtio_store_mb(vq->weak_barriers, 934 + &vring_used_event(&vq->split.vring), 935 + cpu_to_virtio16(vq->vq.vdev, vq->last_used_idx)); 936 + 937 + LAST_ADD_TIME_INVALID(vq); 938 + 939 + END_USE(vq); 940 + return ret; 941 + } 942 + 943 + static void virtqueue_disable_cb_split(struct vring_virtqueue *vq) 944 + { 978 945 if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) { 979 946 vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT; 980 947 ··· 1058 889 vring_used_event(&vq->split.vring) = 0x0; 1059 890 else 1060 891 vq->split.vring.avail->flags = 1061 - cpu_to_virtio16(_vq->vdev, 892 + cpu_to_virtio16(vq->vq.vdev, 1062 893 vq->split.avail_flags_shadow); 1063 894 } 1064 895 } 1065 896 1066 - static unsigned int virtqueue_enable_cb_prepare_split(struct virtqueue *_vq) 897 + static unsigned int virtqueue_enable_cb_prepare_split(struct vring_virtqueue *vq) 1067 898 { 1068 - struct vring_virtqueue *vq = to_vvq(_vq); 1069 899 u16 last_used_idx; 1070 900 1071 901 START_USE(vq); ··· 1078 910 vq->split.avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT; 1079 911 if (!vq->event) 1080 912 vq->split.vring.avail->flags = 1081 - cpu_to_virtio16(_vq->vdev, 913 + cpu_to_virtio16(vq->vq.vdev, 1082 914 vq->split.avail_flags_shadow); 1083 915 } 1084 - vring_used_event(&vq->split.vring) = cpu_to_virtio16(_vq->vdev, 916 + vring_used_event(&vq->split.vring) = cpu_to_virtio16(vq->vq.vdev, 1085 917 last_used_idx = vq->last_used_idx); 1086 918 END_USE(vq); 1087 919 return last_used_idx; 1088 920 } 1089 921 1090 - static bool virtqueue_poll_split(struct virtqueue *_vq, unsigned int last_used_idx) 922 + static bool virtqueue_enable_cb_delayed_split(struct vring_virtqueue *vq) 1091 923 { 1092 - struct vring_virtqueue *vq = to_vvq(_vq); 1093 - 1094 - return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, 1095 - vq->split.vring.used->idx); 1096 - } 1097 - 1098 - static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq) 1099 - { 1100 - struct vring_virtqueue *vq = to_vvq(_vq); 1101 924 u16 bufs; 1102 925 1103 926 START_USE(vq); ··· 1102 943 vq->split.avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT; 1103 944 if (!vq->event) 1104 945 vq->split.vring.avail->flags = 1105 - cpu_to_virtio16(_vq->vdev, 946 + cpu_to_virtio16(vq->vq.vdev, 1106 947 vq->split.avail_flags_shadow); 1107 948 } 1108 949 /* TODO: tune this threshold */ ··· 1110 951 1111 952 virtio_store_mb(vq->weak_barriers, 1112 953 &vring_used_event(&vq->split.vring), 1113 - cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs)); 954 + cpu_to_virtio16(vq->vq.vdev, vq->last_used_idx + bufs)); 1114 955 1115 - if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->split.vring.used->idx) 956 + if (unlikely((u16)(virtio16_to_cpu(vq->vq.vdev, vq->split.vring.used->idx) 1116 957 - vq->last_used_idx) > bufs)) { 1117 958 END_USE(vq); 1118 959 return false; ··· 1122 963 return true; 1123 964 } 1124 965 1125 - static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq) 966 + static void *virtqueue_detach_unused_buf_split(struct vring_virtqueue *vq) 1126 967 { 1127 - struct vring_virtqueue *vq = to_vvq(_vq); 1128 968 unsigned int i; 1129 969 void *buf; 1130 970 ··· 1134 976 continue; 1135 977 /* detach_buf_split clears data, so grab it now. */ 1136 978 buf = vq->split.desc_state[i].data; 1137 - detach_buf_split(vq, i, NULL); 979 + if (virtqueue_is_in_order(vq)) 980 + detach_buf_split_in_order(vq, i, NULL); 981 + else 982 + detach_buf_split(vq, i, NULL); 1138 983 vq->split.avail_idx_shadow--; 1139 - vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, 984 + vq->split.vring.avail->idx = cpu_to_virtio16(vq->vq.vdev, 1140 985 vq->split.avail_idx_shadow); 1141 986 END_USE(vq); 1142 987 return buf; ··· 1170 1009 } 1171 1010 } 1172 1011 1173 - static void virtqueue_reinit_split(struct vring_virtqueue *vq) 1012 + static void virtqueue_reset_split(struct vring_virtqueue *vq) 1174 1013 { 1175 1014 int num; 1176 1015 ··· 1200 1039 1201 1040 /* Put everything in free lists. */ 1202 1041 vq->free_head = 0; 1042 + vq->batch_last.id = UINT_MAX; 1203 1043 } 1204 1044 1205 1045 static int vring_alloc_state_extra_split(struct vring_virtqueue_split *vring_split) ··· 1293 1131 return 0; 1294 1132 } 1295 1133 1134 + static const struct virtqueue_ops split_ops; 1135 + 1296 1136 static struct virtqueue *__vring_new_virtqueue_split(unsigned int index, 1297 1137 struct vring_virtqueue_split *vring_split, 1298 1138 struct virtio_device *vdev, ··· 1312 1148 if (!vq) 1313 1149 return NULL; 1314 1150 1315 - vq->packed_ring = false; 1316 1151 vq->vq.callback = callback; 1317 1152 vq->vq.vdev = vdev; 1318 1153 vq->vq.name = name; ··· 1331 1168 vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && 1332 1169 !context; 1333 1170 vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX); 1171 + vq->layout = virtio_has_feature(vdev, VIRTIO_F_IN_ORDER) ? 1172 + VQ_LAYOUT_SPLIT_IN_ORDER : VQ_LAYOUT_SPLIT; 1334 1173 1335 1174 if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM)) 1336 1175 vq->weak_barriers = false; ··· 1388 1223 return vq; 1389 1224 } 1390 1225 1391 - static int virtqueue_resize_split(struct virtqueue *_vq, u32 num) 1226 + static int virtqueue_resize_split(struct vring_virtqueue *vq, u32 num) 1392 1227 { 1393 1228 struct vring_virtqueue_split vring_split = {}; 1394 - struct vring_virtqueue *vq = to_vvq(_vq); 1395 - struct virtio_device *vdev = _vq->vdev; 1229 + struct virtio_device *vdev = vq->vq.vdev; 1396 1230 int err; 1397 1231 1398 1232 err = vring_alloc_queue_split(&vring_split, vdev, num, ··· 1417 1253 err_state_extra: 1418 1254 vring_free_split(&vring_split, vdev, vq->map); 1419 1255 err: 1420 - virtqueue_reinit_split(vq); 1256 + virtqueue_reset_split(vq); 1421 1257 return -ENOMEM; 1422 1258 } 1423 1259 ··· 1490 1326 unsigned int in_sgs, 1491 1327 void *data, 1492 1328 bool premapped, 1493 - gfp_t gfp) 1329 + gfp_t gfp, 1330 + u16 id, 1331 + unsigned long attr) 1494 1332 { 1495 1333 struct vring_desc_extra *extra; 1496 1334 struct vring_packed_desc *desc; 1497 1335 struct scatterlist *sg; 1498 - unsigned int i, n, err_idx, len; 1499 - u16 head, id; 1336 + unsigned int i, n, err_idx, len, total_in_len = 0; 1337 + u16 head; 1500 1338 dma_addr_t addr; 1501 1339 1502 1340 head = vq->packed.next_avail_idx; ··· 1516 1350 } 1517 1351 1518 1352 i = 0; 1519 - id = vq->free_head; 1520 - BUG_ON(id == vq->packed.vring.num); 1521 1353 1522 1354 for (n = 0; n < out_sgs + in_sgs; n++) { 1523 1355 for (sg = sgs[n]; sg; sg = sg_next(sg)) { 1524 1356 if (vring_map_one_sg(vq, sg, n < out_sgs ? 1525 1357 DMA_TO_DEVICE : DMA_FROM_DEVICE, 1526 - &addr, &len, premapped)) 1358 + &addr, &len, premapped, attr)) 1527 1359 goto unmap_release; 1528 1360 1529 1361 desc[i].flags = cpu_to_le16(n < out_sgs ? ··· 1535 1371 extra[i].flags = n < out_sgs ? 0 : VRING_DESC_F_WRITE; 1536 1372 } 1537 1373 1374 + if (n >= out_sgs) 1375 + total_in_len += len; 1538 1376 i++; 1539 1377 } 1540 1378 } ··· 1583 1417 1 << VRING_PACKED_DESC_F_USED; 1584 1418 } 1585 1419 vq->packed.next_avail_idx = n; 1586 - vq->free_head = vq->packed.desc_extra[id].next; 1420 + if (!virtqueue_is_in_order(vq)) 1421 + vq->free_head = vq->packed.desc_extra[id].next; 1587 1422 1588 1423 /* Store token and indirect buffer state. */ 1589 1424 vq->packed.desc_state[id].num = 1; 1590 1425 vq->packed.desc_state[id].data = data; 1591 1426 vq->packed.desc_state[id].indir_desc = desc; 1592 1427 vq->packed.desc_state[id].last = id; 1428 + vq->packed.desc_state[id].total_in_len = total_in_len; 1593 1429 1594 1430 vq->num_added += 1; 1595 1431 ··· 1612 1444 return -ENOMEM; 1613 1445 } 1614 1446 1615 - static inline int virtqueue_add_packed(struct virtqueue *_vq, 1447 + static inline int virtqueue_add_packed(struct vring_virtqueue *vq, 1616 1448 struct scatterlist *sgs[], 1617 1449 unsigned int total_sg, 1618 1450 unsigned int out_sgs, ··· 1620 1452 void *data, 1621 1453 void *ctx, 1622 1454 bool premapped, 1623 - gfp_t gfp) 1455 + gfp_t gfp, 1456 + unsigned long attr) 1624 1457 { 1625 - struct vring_virtqueue *vq = to_vvq(_vq); 1626 1458 struct vring_packed_desc *desc; 1627 1459 struct scatterlist *sg; 1628 1460 unsigned int i, n, c, descs_used, err_idx, len; ··· 1645 1477 BUG_ON(total_sg == 0); 1646 1478 1647 1479 if (virtqueue_use_indirect(vq, total_sg)) { 1480 + id = vq->free_head; 1481 + BUG_ON(id == vq->packed.vring.num); 1648 1482 err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs, 1649 - in_sgs, data, premapped, gfp); 1483 + in_sgs, data, premapped, gfp, 1484 + id, attr); 1650 1485 if (err != -ENOMEM) { 1651 1486 END_USE(vq); 1652 1487 return err; ··· 1685 1514 1686 1515 if (vring_map_one_sg(vq, sg, n < out_sgs ? 1687 1516 DMA_TO_DEVICE : DMA_FROM_DEVICE, 1688 - &addr, &len, premapped)) 1517 + &addr, &len, premapped, attr)) 1689 1518 goto unmap_release; 1690 1519 1691 1520 flags = cpu_to_le16(vq->packed.avail_used_flags | ··· 1770 1599 return -EIO; 1771 1600 } 1772 1601 1773 - static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq) 1602 + static inline int virtqueue_add_packed_in_order(struct vring_virtqueue *vq, 1603 + struct scatterlist *sgs[], 1604 + unsigned int total_sg, 1605 + unsigned int out_sgs, 1606 + unsigned int in_sgs, 1607 + void *data, 1608 + void *ctx, 1609 + bool premapped, 1610 + gfp_t gfp, 1611 + unsigned long attr) 1774 1612 { 1775 - struct vring_virtqueue *vq = to_vvq(_vq); 1613 + struct vring_packed_desc *desc; 1614 + struct scatterlist *sg; 1615 + unsigned int i, n, sg_count, err_idx, total_in_len = 0; 1616 + __le16 head_flags, flags; 1617 + u16 head, avail_used_flags; 1618 + bool avail_wrap_counter; 1619 + int err; 1620 + 1621 + START_USE(vq); 1622 + 1623 + BUG_ON(data == NULL); 1624 + BUG_ON(ctx && vq->indirect); 1625 + 1626 + if (unlikely(vq->broken)) { 1627 + END_USE(vq); 1628 + return -EIO; 1629 + } 1630 + 1631 + LAST_ADD_TIME_UPDATE(vq); 1632 + 1633 + BUG_ON(total_sg == 0); 1634 + 1635 + if (virtqueue_use_indirect(vq, total_sg)) { 1636 + err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs, 1637 + in_sgs, data, premapped, gfp, 1638 + vq->packed.next_avail_idx, 1639 + attr); 1640 + if (err != -ENOMEM) { 1641 + END_USE(vq); 1642 + return err; 1643 + } 1644 + 1645 + /* fall back on direct */ 1646 + } 1647 + 1648 + head = vq->packed.next_avail_idx; 1649 + avail_used_flags = vq->packed.avail_used_flags; 1650 + avail_wrap_counter = vq->packed.avail_wrap_counter; 1651 + 1652 + WARN_ON_ONCE(total_sg > vq->packed.vring.num && !vq->indirect); 1653 + 1654 + desc = vq->packed.vring.desc; 1655 + i = head; 1656 + 1657 + if (unlikely(vq->vq.num_free < total_sg)) { 1658 + pr_debug("Can't add buf len %i - avail = %i\n", 1659 + total_sg, vq->vq.num_free); 1660 + END_USE(vq); 1661 + return -ENOSPC; 1662 + } 1663 + 1664 + sg_count = 0; 1665 + for (n = 0; n < out_sgs + in_sgs; n++) { 1666 + for (sg = sgs[n]; sg; sg = sg_next(sg)) { 1667 + dma_addr_t addr; 1668 + u32 len; 1669 + 1670 + flags = 0; 1671 + if (++sg_count != total_sg) 1672 + flags |= cpu_to_le16(VRING_DESC_F_NEXT); 1673 + if (n >= out_sgs) 1674 + flags |= cpu_to_le16(VRING_DESC_F_WRITE); 1675 + 1676 + if (vring_map_one_sg(vq, sg, n < out_sgs ? 1677 + DMA_TO_DEVICE : DMA_FROM_DEVICE, 1678 + &addr, &len, premapped, attr)) 1679 + goto unmap_release; 1680 + 1681 + flags |= cpu_to_le16(vq->packed.avail_used_flags); 1682 + 1683 + if (i == head) 1684 + head_flags = flags; 1685 + else 1686 + desc[i].flags = flags; 1687 + 1688 + desc[i].addr = cpu_to_le64(addr); 1689 + desc[i].len = cpu_to_le32(len); 1690 + desc[i].id = cpu_to_le16(head); 1691 + 1692 + if (unlikely(vq->use_map_api)) { 1693 + vq->packed.desc_extra[i].addr = premapped ? 1694 + DMA_MAPPING_ERROR : addr; 1695 + vq->packed.desc_extra[i].len = len; 1696 + vq->packed.desc_extra[i].flags = 1697 + le16_to_cpu(flags); 1698 + } 1699 + 1700 + if ((unlikely(++i >= vq->packed.vring.num))) { 1701 + i = 0; 1702 + vq->packed.avail_used_flags ^= 1703 + 1 << VRING_PACKED_DESC_F_AVAIL | 1704 + 1 << VRING_PACKED_DESC_F_USED; 1705 + vq->packed.avail_wrap_counter ^= 1; 1706 + } 1707 + 1708 + if (n >= out_sgs) 1709 + total_in_len += len; 1710 + } 1711 + } 1712 + 1713 + /* We're using some buffers from the free list. */ 1714 + vq->vq.num_free -= total_sg; 1715 + 1716 + /* Update free pointer */ 1717 + vq->packed.next_avail_idx = i; 1718 + 1719 + /* Store token. */ 1720 + vq->packed.desc_state[head].num = total_sg; 1721 + vq->packed.desc_state[head].data = data; 1722 + vq->packed.desc_state[head].indir_desc = ctx; 1723 + vq->packed.desc_state[head].total_in_len = total_in_len; 1724 + 1725 + /* 1726 + * A driver MUST NOT make the first descriptor in the list 1727 + * available before all subsequent descriptors comprising 1728 + * the list are made available. 1729 + */ 1730 + virtio_wmb(vq->weak_barriers); 1731 + vq->packed.vring.desc[head].flags = head_flags; 1732 + vq->num_added += total_sg; 1733 + 1734 + pr_debug("Added buffer head %i to %p\n", head, vq); 1735 + END_USE(vq); 1736 + 1737 + return 0; 1738 + 1739 + unmap_release: 1740 + err_idx = i; 1741 + i = head; 1742 + vq->packed.avail_used_flags = avail_used_flags; 1743 + vq->packed.avail_wrap_counter = avail_wrap_counter; 1744 + 1745 + for (n = 0; n < total_sg; n++) { 1746 + if (i == err_idx) 1747 + break; 1748 + vring_unmap_extra_packed(vq, &vq->packed.desc_extra[i]); 1749 + i++; 1750 + if (i >= vq->packed.vring.num) 1751 + i = 0; 1752 + } 1753 + 1754 + END_USE(vq); 1755 + return -EIO; 1756 + } 1757 + 1758 + static bool virtqueue_kick_prepare_packed(struct vring_virtqueue *vq) 1759 + { 1776 1760 u16 new, old, off_wrap, flags, wrap_counter, event_idx; 1777 1761 bool needs_kick; 1778 1762 union { ··· 1974 1648 return needs_kick; 1975 1649 } 1976 1650 1977 - static void detach_buf_packed(struct vring_virtqueue *vq, 1978 - unsigned int id, void **ctx) 1651 + static void detach_buf_packed_in_order(struct vring_virtqueue *vq, 1652 + unsigned int id, void **ctx) 1979 1653 { 1980 1654 struct vring_desc_state_packed *state = NULL; 1981 1655 struct vring_packed_desc *desc; ··· 1986 1660 /* Clear data ptr. */ 1987 1661 state->data = NULL; 1988 1662 1989 - vq->packed.desc_extra[state->last].next = vq->free_head; 1990 - vq->free_head = id; 1991 1663 vq->vq.num_free += state->num; 1992 1664 1993 1665 if (unlikely(vq->use_map_api)) { ··· 2022 1698 } 2023 1699 } 2024 1700 1701 + static void detach_buf_packed(struct vring_virtqueue *vq, 1702 + unsigned int id, void **ctx) 1703 + { 1704 + struct vring_desc_state_packed *state = &vq->packed.desc_state[id]; 1705 + 1706 + vq->packed.desc_extra[state->last].next = vq->free_head; 1707 + vq->free_head = id; 1708 + 1709 + detach_buf_packed_in_order(vq, id, ctx); 1710 + } 1711 + 2025 1712 static inline bool is_used_desc_packed(const struct vring_virtqueue *vq, 2026 1713 u16 idx, bool used_wrap_counter) 2027 1714 { ··· 2046 1711 return avail == used && used == used_wrap_counter; 2047 1712 } 2048 1713 2049 - static bool more_used_packed(const struct vring_virtqueue *vq) 1714 + static bool virtqueue_poll_packed(const struct vring_virtqueue *vq, 1715 + unsigned int off_wrap) 2050 1716 { 2051 - u16 last_used; 2052 - u16 last_used_idx; 2053 - bool used_wrap_counter; 1717 + bool wrap_counter; 1718 + u16 used_idx; 2054 1719 2055 - last_used_idx = READ_ONCE(vq->last_used_idx); 2056 - last_used = packed_last_used(last_used_idx); 2057 - used_wrap_counter = packed_used_wrap_counter(last_used_idx); 2058 - return is_used_desc_packed(vq, last_used, used_wrap_counter); 1720 + wrap_counter = off_wrap >> VRING_PACKED_EVENT_F_WRAP_CTR; 1721 + used_idx = off_wrap & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR); 1722 + 1723 + return is_used_desc_packed(vq, used_idx, wrap_counter); 2059 1724 } 2060 1725 2061 - static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq, 1726 + static bool more_used_packed(const struct vring_virtqueue *vq) 1727 + { 1728 + return virtqueue_poll_packed(vq, READ_ONCE(vq->last_used_idx)); 1729 + } 1730 + 1731 + static void update_last_used_idx_packed(struct vring_virtqueue *vq, 1732 + u16 id, u16 last_used, 1733 + u16 used_wrap_counter) 1734 + { 1735 + last_used += vq->packed.desc_state[id].num; 1736 + if (unlikely(last_used >= vq->packed.vring.num)) { 1737 + last_used -= vq->packed.vring.num; 1738 + used_wrap_counter ^= 1; 1739 + } 1740 + 1741 + last_used = (last_used | (used_wrap_counter << VRING_PACKED_EVENT_F_WRAP_CTR)); 1742 + WRITE_ONCE(vq->last_used_idx, last_used); 1743 + 1744 + /* 1745 + * If we expect an interrupt for the next entry, tell host 1746 + * by writing event index and flush out the write before 1747 + * the read in the next get_buf call. 1748 + */ 1749 + if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DESC) 1750 + virtio_store_mb(vq->weak_barriers, 1751 + &vq->packed.vring.driver->off_wrap, 1752 + cpu_to_le16(vq->last_used_idx)); 1753 + } 1754 + 1755 + static bool more_used_packed_in_order(const struct vring_virtqueue *vq) 1756 + { 1757 + if (vq->batch_last.id != UINT_MAX) 1758 + return true; 1759 + 1760 + return virtqueue_poll_packed(vq, READ_ONCE(vq->last_used_idx)); 1761 + } 1762 + 1763 + static void *virtqueue_get_buf_ctx_packed_in_order(struct vring_virtqueue *vq, 1764 + unsigned int *len, 1765 + void **ctx) 1766 + { 1767 + unsigned int num = vq->packed.vring.num; 1768 + u16 last_used, last_used_idx; 1769 + bool used_wrap_counter; 1770 + void *ret; 1771 + 1772 + START_USE(vq); 1773 + 1774 + if (unlikely(vq->broken)) { 1775 + END_USE(vq); 1776 + return NULL; 1777 + } 1778 + 1779 + last_used_idx = vq->last_used_idx; 1780 + used_wrap_counter = packed_used_wrap_counter(last_used_idx); 1781 + last_used = packed_last_used(last_used_idx); 1782 + 1783 + if (vq->batch_last.id == UINT_MAX) { 1784 + if (!more_used_packed_in_order(vq)) { 1785 + pr_debug("No more buffers in queue\n"); 1786 + END_USE(vq); 1787 + return NULL; 1788 + } 1789 + /* Only get used elements after they have been exposed by host. */ 1790 + virtio_rmb(vq->weak_barriers); 1791 + vq->batch_last.id = 1792 + le16_to_cpu(vq->packed.vring.desc[last_used].id); 1793 + vq->batch_last.len = 1794 + le32_to_cpu(vq->packed.vring.desc[last_used].len); 1795 + } 1796 + 1797 + if (vq->batch_last.id == last_used) { 1798 + vq->batch_last.id = UINT_MAX; 1799 + *len = vq->batch_last.len; 1800 + } else { 1801 + *len = vq->packed.desc_state[last_used].total_in_len; 1802 + } 1803 + 1804 + if (unlikely(last_used >= num)) { 1805 + BAD_RING(vq, "id %u out of range\n", last_used); 1806 + return NULL; 1807 + } 1808 + if (unlikely(!vq->packed.desc_state[last_used].data)) { 1809 + BAD_RING(vq, "id %u is not a head!\n", last_used); 1810 + return NULL; 1811 + } 1812 + 1813 + /* detach_buf_packed clears data, so grab it now. */ 1814 + ret = vq->packed.desc_state[last_used].data; 1815 + detach_buf_packed_in_order(vq, last_used, ctx); 1816 + 1817 + update_last_used_idx_packed(vq, last_used, last_used, 1818 + used_wrap_counter); 1819 + 1820 + LAST_ADD_TIME_INVALID(vq); 1821 + 1822 + END_USE(vq); 1823 + return ret; 1824 + } 1825 + 1826 + static void *virtqueue_get_buf_ctx_packed(struct vring_virtqueue *vq, 2062 1827 unsigned int *len, 2063 1828 void **ctx) 2064 1829 { 2065 - struct vring_virtqueue *vq = to_vvq(_vq); 1830 + unsigned int num = vq->packed.vring.num; 2066 1831 u16 last_used, id, last_used_idx; 2067 1832 bool used_wrap_counter; 2068 1833 void *ret; ··· 2189 1754 id = le16_to_cpu(vq->packed.vring.desc[last_used].id); 2190 1755 *len = le32_to_cpu(vq->packed.vring.desc[last_used].len); 2191 1756 2192 - if (unlikely(id >= vq->packed.vring.num)) { 1757 + if (unlikely(id >= num)) { 2193 1758 BAD_RING(vq, "id %u out of range\n", id); 2194 1759 return NULL; 2195 1760 } ··· 2202 1767 ret = vq->packed.desc_state[id].data; 2203 1768 detach_buf_packed(vq, id, ctx); 2204 1769 2205 - last_used += vq->packed.desc_state[id].num; 2206 - if (unlikely(last_used >= vq->packed.vring.num)) { 2207 - last_used -= vq->packed.vring.num; 2208 - used_wrap_counter ^= 1; 2209 - } 2210 - 2211 - last_used = (last_used | (used_wrap_counter << VRING_PACKED_EVENT_F_WRAP_CTR)); 2212 - WRITE_ONCE(vq->last_used_idx, last_used); 2213 - 2214 - /* 2215 - * If we expect an interrupt for the next entry, tell host 2216 - * by writing event index and flush out the write before 2217 - * the read in the next get_buf call. 2218 - */ 2219 - if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DESC) 2220 - virtio_store_mb(vq->weak_barriers, 2221 - &vq->packed.vring.driver->off_wrap, 2222 - cpu_to_le16(vq->last_used_idx)); 1770 + update_last_used_idx_packed(vq, id, last_used, used_wrap_counter); 2223 1771 2224 1772 LAST_ADD_TIME_INVALID(vq); 2225 1773 ··· 2210 1792 return ret; 2211 1793 } 2212 1794 2213 - static void virtqueue_disable_cb_packed(struct virtqueue *_vq) 1795 + static void virtqueue_disable_cb_packed(struct vring_virtqueue *vq) 2214 1796 { 2215 - struct vring_virtqueue *vq = to_vvq(_vq); 2216 - 2217 1797 if (vq->packed.event_flags_shadow != VRING_PACKED_EVENT_FLAG_DISABLE) { 2218 1798 vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_DISABLE; 2219 1799 ··· 2227 1811 } 2228 1812 } 2229 1813 2230 - static unsigned int virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq) 1814 + static unsigned int virtqueue_enable_cb_prepare_packed(struct vring_virtqueue *vq) 2231 1815 { 2232 - struct vring_virtqueue *vq = to_vvq(_vq); 2233 - 2234 1816 START_USE(vq); 2235 1817 2236 1818 /* ··· 2258 1844 return vq->last_used_idx; 2259 1845 } 2260 1846 2261 - static bool virtqueue_poll_packed(struct virtqueue *_vq, u16 off_wrap) 1847 + static bool virtqueue_enable_cb_delayed_packed(struct vring_virtqueue *vq) 2262 1848 { 2263 - struct vring_virtqueue *vq = to_vvq(_vq); 2264 - bool wrap_counter; 2265 - u16 used_idx; 2266 - 2267 - wrap_counter = off_wrap >> VRING_PACKED_EVENT_F_WRAP_CTR; 2268 - used_idx = off_wrap & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR); 2269 - 2270 - return is_used_desc_packed(vq, used_idx, wrap_counter); 2271 - } 2272 - 2273 - static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq) 2274 - { 2275 - struct vring_virtqueue *vq = to_vvq(_vq); 2276 1849 u16 used_idx, wrap_counter, last_used_idx; 2277 1850 u16 bufs; 2278 1851 ··· 2318 1917 return true; 2319 1918 } 2320 1919 2321 - static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq) 1920 + static void *virtqueue_detach_unused_buf_packed(struct vring_virtqueue *vq) 2322 1921 { 2323 - struct vring_virtqueue *vq = to_vvq(_vq); 2324 1922 unsigned int i; 2325 1923 void *buf; 2326 1924 ··· 2330 1930 continue; 2331 1931 /* detach_buf clears data, so grab it now. */ 2332 1932 buf = vq->packed.desc_state[i].data; 2333 - detach_buf_packed(vq, i, NULL); 1933 + if (virtqueue_is_in_order(vq)) 1934 + detach_buf_packed_in_order(vq, i, NULL); 1935 + else 1936 + detach_buf_packed(vq, i, NULL); 2334 1937 END_USE(vq); 2335 1938 return buf; 2336 1939 } ··· 2358 1955 2359 1956 for (i = 0; i < num - 1; i++) 2360 1957 desc_extra[i].next = i + 1; 1958 + 1959 + desc_extra[num - 1].next = 0; 2361 1960 2362 1961 return desc_extra; 2363 1962 } ··· 2492 2087 { 2493 2088 vq->packed = *vring_packed; 2494 2089 2495 - /* Put everything in free lists. */ 2496 - vq->free_head = 0; 2090 + if (virtqueue_is_in_order(vq)) { 2091 + vq->batch_last.id = UINT_MAX; 2092 + } else { 2093 + /* 2094 + * Put everything in free lists. Note that 2095 + * next_avail_idx is sufficient with IN_ORDER so 2096 + * free_head is unused. 2097 + */ 2098 + vq->free_head = 0; 2099 + } 2497 2100 } 2498 - 2499 - static void virtqueue_reinit_packed(struct vring_virtqueue *vq) 2101 + static void virtqueue_reset_packed(struct vring_virtqueue *vq) 2500 2102 { 2501 2103 memset(vq->packed.vring.device, 0, vq->packed.event_size_in_bytes); 2502 2104 memset(vq->packed.vring.driver, 0, vq->packed.event_size_in_bytes); 2503 2105 2504 2106 /* we need to reset the desc.flags. For more, see is_used_desc_packed() */ 2505 2107 memset(vq->packed.vring.desc, 0, vq->packed.ring_size_in_bytes); 2506 - 2507 2108 virtqueue_init(vq, vq->packed.vring.num); 2508 2109 virtqueue_vring_init_packed(&vq->packed, !!vq->vq.callback); 2509 2110 } 2111 + 2112 + static const struct virtqueue_ops packed_ops; 2510 2113 2511 2114 static struct virtqueue *__vring_new_virtqueue_packed(unsigned int index, 2512 2115 struct vring_virtqueue_packed *vring_packed, ··· 2546 2133 #else 2547 2134 vq->broken = false; 2548 2135 #endif 2549 - vq->packed_ring = true; 2550 2136 vq->map = map; 2551 2137 vq->use_map_api = vring_use_map_api(vdev); 2552 2138 2553 2139 vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && 2554 2140 !context; 2555 2141 vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX); 2142 + vq->layout = virtio_has_feature(vdev, VIRTIO_F_IN_ORDER) ? 2143 + VQ_LAYOUT_PACKED_IN_ORDER : VQ_LAYOUT_PACKED; 2556 2144 2557 2145 if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM)) 2558 2146 vq->weak_barriers = false; ··· 2606 2192 return vq; 2607 2193 } 2608 2194 2609 - static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num) 2195 + static int virtqueue_resize_packed(struct vring_virtqueue *vq, u32 num) 2610 2196 { 2611 2197 struct vring_virtqueue_packed vring_packed = {}; 2612 - struct vring_virtqueue *vq = to_vvq(_vq); 2613 - struct virtio_device *vdev = _vq->vdev; 2198 + struct virtio_device *vdev = vq->vq.vdev; 2614 2199 int err; 2615 2200 2616 2201 if (vring_alloc_queue_packed(&vring_packed, vdev, num, vq->map)) ··· 2631 2218 err_state_extra: 2632 2219 vring_free_packed(&vring_packed, vdev, vq->map); 2633 2220 err_ring: 2634 - virtqueue_reinit_packed(vq); 2221 + virtqueue_reset_packed(vq); 2635 2222 return -ENOMEM; 2636 2223 } 2224 + 2225 + static const struct virtqueue_ops split_ops = { 2226 + .add = virtqueue_add_split, 2227 + .get = virtqueue_get_buf_ctx_split, 2228 + .kick_prepare = virtqueue_kick_prepare_split, 2229 + .disable_cb = virtqueue_disable_cb_split, 2230 + .enable_cb_delayed = virtqueue_enable_cb_delayed_split, 2231 + .enable_cb_prepare = virtqueue_enable_cb_prepare_split, 2232 + .poll = virtqueue_poll_split, 2233 + .detach_unused_buf = virtqueue_detach_unused_buf_split, 2234 + .more_used = more_used_split, 2235 + .resize = virtqueue_resize_split, 2236 + .reset = virtqueue_reset_split, 2237 + }; 2238 + 2239 + static const struct virtqueue_ops packed_ops = { 2240 + .add = virtqueue_add_packed, 2241 + .get = virtqueue_get_buf_ctx_packed, 2242 + .kick_prepare = virtqueue_kick_prepare_packed, 2243 + .disable_cb = virtqueue_disable_cb_packed, 2244 + .enable_cb_delayed = virtqueue_enable_cb_delayed_packed, 2245 + .enable_cb_prepare = virtqueue_enable_cb_prepare_packed, 2246 + .poll = virtqueue_poll_packed, 2247 + .detach_unused_buf = virtqueue_detach_unused_buf_packed, 2248 + .more_used = more_used_packed, 2249 + .resize = virtqueue_resize_packed, 2250 + .reset = virtqueue_reset_packed, 2251 + }; 2252 + 2253 + static const struct virtqueue_ops split_in_order_ops = { 2254 + .add = virtqueue_add_split, 2255 + .get = virtqueue_get_buf_ctx_split_in_order, 2256 + .kick_prepare = virtqueue_kick_prepare_split, 2257 + .disable_cb = virtqueue_disable_cb_split, 2258 + .enable_cb_delayed = virtqueue_enable_cb_delayed_split, 2259 + .enable_cb_prepare = virtqueue_enable_cb_prepare_split, 2260 + .poll = virtqueue_poll_split, 2261 + .detach_unused_buf = virtqueue_detach_unused_buf_split, 2262 + .more_used = more_used_split_in_order, 2263 + .resize = virtqueue_resize_split, 2264 + .reset = virtqueue_reset_split, 2265 + }; 2266 + 2267 + static const struct virtqueue_ops packed_in_order_ops = { 2268 + .add = virtqueue_add_packed_in_order, 2269 + .get = virtqueue_get_buf_ctx_packed_in_order, 2270 + .kick_prepare = virtqueue_kick_prepare_packed, 2271 + .disable_cb = virtqueue_disable_cb_packed, 2272 + .enable_cb_delayed = virtqueue_enable_cb_delayed_packed, 2273 + .enable_cb_prepare = virtqueue_enable_cb_prepare_packed, 2274 + .poll = virtqueue_poll_packed, 2275 + .detach_unused_buf = virtqueue_detach_unused_buf_packed, 2276 + .more_used = more_used_packed_in_order, 2277 + .resize = virtqueue_resize_packed, 2278 + .reset = virtqueue_reset_packed, 2279 + }; 2637 2280 2638 2281 static int virtqueue_disable_and_recycle(struct virtqueue *_vq, 2639 2282 void (*recycle)(struct virtqueue *vq, void *buf)) ··· 2733 2264 * Generic functions and exported symbols. 2734 2265 */ 2735 2266 2267 + #define VIRTQUEUE_CALL(vq, op, ...) \ 2268 + ({ \ 2269 + typeof(vq) __VIRTQUEUE_CALL_vq = (vq); \ 2270 + typeof(split_ops.op(__VIRTQUEUE_CALL_vq, ##__VA_ARGS__)) ret; \ 2271 + \ 2272 + switch (__VIRTQUEUE_CALL_vq->layout) { \ 2273 + case VQ_LAYOUT_SPLIT: \ 2274 + ret = split_ops.op(__VIRTQUEUE_CALL_vq, ##__VA_ARGS__); \ 2275 + break; \ 2276 + case VQ_LAYOUT_PACKED: \ 2277 + ret = packed_ops.op(__VIRTQUEUE_CALL_vq, ##__VA_ARGS__);\ 2278 + break; \ 2279 + case VQ_LAYOUT_SPLIT_IN_ORDER: \ 2280 + ret = split_in_order_ops.op(vq, ##__VA_ARGS__); \ 2281 + break; \ 2282 + case VQ_LAYOUT_PACKED_IN_ORDER: \ 2283 + ret = packed_in_order_ops.op(vq, ##__VA_ARGS__); \ 2284 + break; \ 2285 + default: \ 2286 + BUG(); \ 2287 + break; \ 2288 + } \ 2289 + ret; \ 2290 + }) 2291 + 2292 + #define VOID_VIRTQUEUE_CALL(vq, op, ...) \ 2293 + ({ \ 2294 + typeof(vq) __VIRTQUEUE_CALL_vq = (vq); \ 2295 + \ 2296 + switch (__VIRTQUEUE_CALL_vq->layout) { \ 2297 + case VQ_LAYOUT_SPLIT: \ 2298 + split_ops.op(__VIRTQUEUE_CALL_vq, ##__VA_ARGS__); \ 2299 + break; \ 2300 + case VQ_LAYOUT_PACKED: \ 2301 + packed_ops.op(__VIRTQUEUE_CALL_vq, ##__VA_ARGS__); \ 2302 + break; \ 2303 + case VQ_LAYOUT_SPLIT_IN_ORDER: \ 2304 + split_in_order_ops.op(vq, ##__VA_ARGS__); \ 2305 + break; \ 2306 + case VQ_LAYOUT_PACKED_IN_ORDER: \ 2307 + packed_in_order_ops.op(vq, ##__VA_ARGS__); \ 2308 + break; \ 2309 + default: \ 2310 + BUG(); \ 2311 + break; \ 2312 + } \ 2313 + }) 2314 + 2736 2315 static inline int virtqueue_add(struct virtqueue *_vq, 2737 2316 struct scatterlist *sgs[], 2738 2317 unsigned int total_sg, ··· 2789 2272 void *data, 2790 2273 void *ctx, 2791 2274 bool premapped, 2792 - gfp_t gfp) 2275 + gfp_t gfp, 2276 + unsigned long attr) 2793 2277 { 2794 2278 struct vring_virtqueue *vq = to_vvq(_vq); 2795 2279 2796 - return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg, 2797 - out_sgs, in_sgs, data, ctx, premapped, gfp) : 2798 - virtqueue_add_split(_vq, sgs, total_sg, 2799 - out_sgs, in_sgs, data, ctx, premapped, gfp); 2280 + return VIRTQUEUE_CALL(vq, add, sgs, total_sg, 2281 + out_sgs, in_sgs, data, 2282 + ctx, premapped, gfp, attr); 2800 2283 } 2801 2284 2802 2285 /** ··· 2834 2317 total_sg++; 2835 2318 } 2836 2319 return virtqueue_add(_vq, sgs, total_sg, out_sgs, in_sgs, 2837 - data, NULL, false, gfp); 2320 + data, NULL, false, gfp, 0); 2838 2321 } 2839 2322 EXPORT_SYMBOL_GPL(virtqueue_add_sgs); 2840 2323 ··· 2856 2339 void *data, 2857 2340 gfp_t gfp) 2858 2341 { 2859 - return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, false, gfp); 2342 + return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, false, gfp, 0); 2860 2343 } 2861 2344 EXPORT_SYMBOL_GPL(virtqueue_add_outbuf); 2862 2345 ··· 2879 2362 void *data, 2880 2363 gfp_t gfp) 2881 2364 { 2882 - return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, true, gfp); 2365 + return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, true, gfp, 0); 2883 2366 } 2884 2367 EXPORT_SYMBOL_GPL(virtqueue_add_outbuf_premapped); 2885 2368 ··· 2901 2384 void *data, 2902 2385 gfp_t gfp) 2903 2386 { 2904 - return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, false, gfp); 2387 + return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, false, gfp, 0); 2905 2388 } 2906 2389 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf); 2390 + 2391 + /** 2392 + * virtqueue_add_inbuf_cache_clean - expose input buffers with cache clean 2393 + * @vq: the struct virtqueue we're talking about. 2394 + * @sg: scatterlist (must be well-formed and terminated!) 2395 + * @num: the number of entries in @sg writable by other side 2396 + * @data: the token identifying the buffer. 2397 + * @gfp: how to do memory allocations (if necessary). 2398 + * 2399 + * Same as virtqueue_add_inbuf but passes DMA_ATTR_CPU_CACHE_CLEAN to indicate 2400 + * that the CPU will not dirty any cacheline overlapping this buffer while it 2401 + * is available, and to suppress overlapping cacheline warnings in DMA debug 2402 + * builds. 2403 + * 2404 + * Caller must ensure we don't call this with other virtqueue operations 2405 + * at the same time (except where noted). 2406 + * 2407 + * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO). 2408 + */ 2409 + int virtqueue_add_inbuf_cache_clean(struct virtqueue *vq, 2410 + struct scatterlist *sg, unsigned int num, 2411 + void *data, 2412 + gfp_t gfp) 2413 + { 2414 + return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, false, gfp, 2415 + DMA_ATTR_CPU_CACHE_CLEAN); 2416 + } 2417 + EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_cache_clean); 2907 2418 2908 2419 /** 2909 2420 * virtqueue_add_inbuf_ctx - expose input buffers to other end ··· 2953 2408 void *ctx, 2954 2409 gfp_t gfp) 2955 2410 { 2956 - return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, false, gfp); 2411 + return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, false, gfp, 0); 2957 2412 } 2958 2413 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx); 2959 2414 ··· 2978 2433 void *ctx, 2979 2434 gfp_t gfp) 2980 2435 { 2981 - return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, true, gfp); 2436 + return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, true, gfp, 0); 2982 2437 } 2983 2438 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_premapped); 2984 2439 ··· 3014 2469 { 3015 2470 struct vring_virtqueue *vq = to_vvq(_vq); 3016 2471 3017 - return vq->packed_ring ? virtqueue_kick_prepare_packed(_vq) : 3018 - virtqueue_kick_prepare_split(_vq); 2472 + return VIRTQUEUE_CALL(vq, kick_prepare); 3019 2473 } 3020 2474 EXPORT_SYMBOL_GPL(virtqueue_kick_prepare); 3021 2475 ··· 3084 2540 { 3085 2541 struct vring_virtqueue *vq = to_vvq(_vq); 3086 2542 3087 - return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) : 3088 - virtqueue_get_buf_ctx_split(_vq, len, ctx); 2543 + return VIRTQUEUE_CALL(vq, get, len, ctx); 3089 2544 } 3090 2545 EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx); 3091 2546 ··· 3106 2563 { 3107 2564 struct vring_virtqueue *vq = to_vvq(_vq); 3108 2565 3109 - if (vq->packed_ring) 3110 - virtqueue_disable_cb_packed(_vq); 3111 - else 3112 - virtqueue_disable_cb_split(_vq); 2566 + VOID_VIRTQUEUE_CALL(vq, disable_cb); 3113 2567 } 3114 2568 EXPORT_SYMBOL_GPL(virtqueue_disable_cb); 3115 2569 ··· 3129 2589 if (vq->event_triggered) 3130 2590 vq->event_triggered = false; 3131 2591 3132 - return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) : 3133 - virtqueue_enable_cb_prepare_split(_vq); 2592 + return VIRTQUEUE_CALL(vq, enable_cb_prepare); 3134 2593 } 3135 2594 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare); 3136 2595 ··· 3150 2611 return false; 3151 2612 3152 2613 virtio_mb(vq->weak_barriers); 3153 - return vq->packed_ring ? virtqueue_poll_packed(_vq, last_used_idx) : 3154 - virtqueue_poll_split(_vq, last_used_idx); 2614 + 2615 + return VIRTQUEUE_CALL(vq, poll, last_used_idx); 3155 2616 } 3156 2617 EXPORT_SYMBOL_GPL(virtqueue_poll); 3157 2618 ··· 3194 2655 if (vq->event_triggered) 3195 2656 data_race(vq->event_triggered = false); 3196 2657 3197 - return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) : 3198 - virtqueue_enable_cb_delayed_split(_vq); 2658 + return VIRTQUEUE_CALL(vq, enable_cb_delayed); 3199 2659 } 3200 2660 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed); 3201 2661 ··· 3210 2672 { 3211 2673 struct vring_virtqueue *vq = to_vvq(_vq); 3212 2674 3213 - return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) : 3214 - virtqueue_detach_unused_buf_split(_vq); 2675 + return VIRTQUEUE_CALL(vq, detach_unused_buf); 3215 2676 } 3216 2677 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf); 3217 2678 3218 2679 static inline bool more_used(const struct vring_virtqueue *vq) 3219 2680 { 3220 - return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq); 2681 + return VIRTQUEUE_CALL(vq, more_used); 3221 2682 } 3222 2683 3223 2684 /** ··· 3346 2809 if (!num) 3347 2810 return -EINVAL; 3348 2811 3349 - if ((vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num) == num) 2812 + if (virtqueue_get_vring_size(_vq) == num) 3350 2813 return 0; 3351 2814 3352 2815 err = virtqueue_disable_and_recycle(_vq, recycle); ··· 3355 2818 if (recycle_done) 3356 2819 recycle_done(_vq); 3357 2820 3358 - if (vq->packed_ring) 3359 - err = virtqueue_resize_packed(_vq, num); 3360 - else 3361 - err = virtqueue_resize_split(_vq, num); 2821 + err = VIRTQUEUE_CALL(vq, resize, num); 3362 2822 3363 2823 err_reset = virtqueue_enable_after_reset(_vq); 3364 2824 if (err_reset) ··· 3393 2859 if (recycle_done) 3394 2860 recycle_done(_vq); 3395 2861 3396 - if (vq->packed_ring) 3397 - virtqueue_reinit_packed(vq); 3398 - else 3399 - virtqueue_reinit_split(vq); 2862 + VOID_VIRTQUEUE_CALL(vq, reset); 3400 2863 3401 2864 return virtqueue_enable_after_reset(_vq); 3402 2865 } ··· 3436 2905 struct vring_virtqueue *vq = to_vvq(_vq); 3437 2906 3438 2907 if (vq->we_own_ring) { 3439 - if (vq->packed_ring) { 2908 + if (virtqueue_is_packed(vq)) { 3440 2909 vring_free_queue(vq->vq.vdev, 3441 2910 vq->packed.ring_size_in_bytes, 3442 2911 vq->packed.vring.desc, ··· 3465 2934 vq->map); 3466 2935 } 3467 2936 } 3468 - if (!vq->packed_ring) { 2937 + if (!virtqueue_is_packed(vq)) { 3469 2938 kfree(vq->split.desc_state); 3470 2939 kfree(vq->split.desc_extra); 3471 2940 } ··· 3490 2959 struct vring_virtqueue *vq = to_vvq(_vq); 3491 2960 u16 next; 3492 2961 3493 - if (vq->packed_ring) 2962 + if (virtqueue_is_packed(vq)) 3494 2963 next = (vq->packed.next_avail_idx & 3495 2964 ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR))) | 3496 2965 vq->packed.avail_wrap_counter << ··· 3523 2992 break; 3524 2993 case VIRTIO_F_NOTIFICATION_DATA: 3525 2994 break; 2995 + case VIRTIO_F_IN_ORDER: 2996 + break; 3526 2997 default: 3527 2998 /* We don't understand this bit. */ 3528 2999 __virtio_clear_bit(vdev, i); ··· 3545 3012 3546 3013 const struct vring_virtqueue *vq = to_vvq(_vq); 3547 3014 3548 - return vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num; 3015 + return virtqueue_is_packed(vq) ? vq->packed.vring.num : 3016 + vq->split.vring.num; 3549 3017 } 3550 3018 EXPORT_SYMBOL_GPL(virtqueue_get_vring_size); 3551 3019 ··· 3629 3095 3630 3096 BUG_ON(!vq->we_own_ring); 3631 3097 3632 - if (vq->packed_ring) 3098 + if (virtqueue_is_packed(vq)) 3633 3099 return vq->packed.ring_dma_addr; 3634 3100 3635 3101 return vq->split.queue_dma_addr; ··· 3642 3108 3643 3109 BUG_ON(!vq->we_own_ring); 3644 3110 3645 - if (vq->packed_ring) 3111 + if (virtqueue_is_packed(vq)) 3646 3112 return vq->packed.driver_event_dma_addr; 3647 3113 3648 3114 return vq->split.queue_dma_addr + ··· 3656 3122 3657 3123 BUG_ON(!vq->we_own_ring); 3658 3124 3659 - if (vq->packed_ring) 3125 + if (virtqueue_is_packed(vq)) 3660 3126 return vq->packed.device_event_dma_addr; 3661 3127 3662 3128 return vq->split.queue_dma_addr +
+20
include/linux/dma-mapping.h
··· 7 7 #include <linux/dma-direction.h> 8 8 #include <linux/scatterlist.h> 9 9 #include <linux/bug.h> 10 + #include <linux/cache.h> 10 11 11 12 /** 12 13 * List of possible attributes associated with a DMA mapping. The semantics ··· 78 77 * provided must never be mapped cacheable into the CPU. 79 78 */ 80 79 #define DMA_ATTR_MMIO (1UL << 10) 80 + 81 + /* 82 + * DMA_ATTR_CPU_CACHE_CLEAN: Indicates the CPU will not dirty any cacheline 83 + * overlapping this buffer while it is mapped for DMA. All mappings sharing 84 + * a cacheline must have this attribute for this to be considered safe. 85 + */ 86 + #define DMA_ATTR_CPU_CACHE_CLEAN (1UL << 11) 81 87 82 88 /* 83 89 * A dma_addr_t can hold any valid DMA or bus address for the platform. It can ··· 710 702 return 1; 711 703 } 712 704 #endif 705 + 706 + #ifdef ARCH_HAS_DMA_MINALIGN 707 + #define ____dma_from_device_aligned __aligned(ARCH_DMA_MINALIGN) 708 + #else 709 + #define ____dma_from_device_aligned 710 + #endif 711 + /* Mark start of DMA buffer */ 712 + #define __dma_from_device_group_begin(GROUP) \ 713 + __cacheline_group_begin(GROUP) ____dma_from_device_aligned 714 + /* Mark end of DMA buffer */ 715 + #define __dma_from_device_group_end(GROUP) \ 716 + __cacheline_group_end(GROUP) ____dma_from_device_aligned 713 717 714 718 static inline void *dmam_alloc_coherent(struct device *dev, size_t size, 715 719 dma_addr_t *dma_handle, gfp_t gfp)
+3 -1
include/linux/vdpa.h
··· 312 312 * @idx: virtqueue index 313 313 * Returns the affinity mask 314 314 * @set_group_asid: Set address space identifier for a 315 - * virtqueue group (optional) 315 + * virtqueue group (optional). Caller must 316 + * prevent this from being executed concurrently 317 + * with set_status. 316 318 * @vdev: vdpa device 317 319 * @group: virtqueue group 318 320 * @asid: address space id for this group
+8 -3
include/linux/virtio.h
··· 43 43 void *priv; 44 44 }; 45 45 46 - struct vduse_iova_domain; 46 + struct vduse_vq_group; 47 47 48 48 union virtio_map { 49 49 /* Device that performs DMA */ 50 50 struct device *dma_dev; 51 - /* VDUSE specific mapping data */ 52 - struct vduse_iova_domain *iova_domain; 51 + /* VDUSE specific virtqueue group for doing map */ 52 + struct vduse_vq_group *group; 53 53 }; 54 54 55 55 int virtqueue_add_outbuf(struct virtqueue *vq, ··· 61 61 struct scatterlist sg[], unsigned int num, 62 62 void *data, 63 63 gfp_t gfp); 64 + 65 + int virtqueue_add_inbuf_cache_clean(struct virtqueue *vq, 66 + struct scatterlist sg[], unsigned int num, 67 + void *data, 68 + gfp_t gfp); 64 69 65 70 int virtqueue_add_inbuf_ctx(struct virtqueue *vq, 66 71 struct scatterlist sg[], unsigned int num,
+80 -5
include/uapi/linux/vduse.h
··· 10 10 11 11 #define VDUSE_API_VERSION 0 12 12 13 + /* VQ groups and ASID support */ 14 + 15 + #define VDUSE_API_VERSION_1 1 16 + 13 17 /* 14 18 * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION). 15 19 * This is used for future extension. ··· 31 27 * @features: virtio features 32 28 * @vq_num: the number of virtqueues 33 29 * @vq_align: the allocation alignment of virtqueue's metadata 30 + * @ngroups: number of vq groups that VDUSE device declares 31 + * @nas: number of address spaces that VDUSE device declares 34 32 * @reserved: for future use, needs to be initialized to zero 35 33 * @config_size: the size of the configuration space 36 34 * @config: the buffer of the configuration space ··· 47 41 __u64 features; 48 42 __u32 vq_num; 49 43 __u32 vq_align; 50 - __u32 reserved[13]; 44 + __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */ 45 + __u32 nas; /* if VDUSE_API_VERSION >= 1 */ 46 + __u32 reserved[11]; 51 47 __u32 config_size; 52 48 __u8 config[]; 53 49 }; ··· 126 118 * struct vduse_vq_config - basic configuration of a virtqueue 127 119 * @index: virtqueue index 128 120 * @max_size: the max size of virtqueue 129 - * @reserved: for future use, needs to be initialized to zero 121 + * @reserved1: for future use, needs to be initialized to zero 122 + * @group: virtqueue group 123 + * @reserved2: for future use, needs to be initialized to zero 130 124 * 131 125 * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue. 132 126 */ 133 127 struct vduse_vq_config { 134 128 __u32 index; 135 129 __u16 max_size; 136 - __u16 reserved[13]; 130 + __u16 reserved1; 131 + __u32 group; 132 + __u16 reserved2[10]; 137 133 }; 138 134 139 135 /* ··· 166 154 __u16 last_avail_idx; 167 155 __u16 last_used_counter; 168 156 __u16 last_used_idx; 157 + }; 158 + 159 + /** 160 + * struct vduse_vq_group_asid - virtqueue group ASID 161 + * @group: Index of the virtqueue group 162 + * @asid: Address space ID of the group 163 + */ 164 + struct vduse_vq_group_asid { 165 + __u32 group; 166 + __u32 asid; 169 167 }; 170 168 171 169 /** ··· 237 215 * @uaddr: start address of userspace memory, it must be aligned to page size 238 216 * @iova: start of the IOVA region 239 217 * @size: size of the IOVA region 218 + * @asid: Address space ID of the IOVA region 240 219 * @reserved: for future use, needs to be initialized to zero 241 220 * 242 221 * Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM ··· 247 224 __u64 uaddr; 248 225 __u64 iova; 249 226 __u64 size; 250 - __u64 reserved[3]; 227 + __u32 asid; 228 + __u32 reserved[5]; 251 229 }; 252 230 253 231 /* Register userspace memory for IOVA regions */ ··· 262 238 * @start: start of the IOVA region 263 239 * @last: last of the IOVA region 264 240 * @capability: capability of the IOVA region 241 + * @asid: Address space ID of the IOVA region, only if device API version >= 1 265 242 * @reserved: for future use, needs to be initialized to zero 266 243 * 267 244 * Structure used by VDUSE_IOTLB_GET_INFO ioctl to get information of ··· 273 248 __u64 last; 274 249 #define VDUSE_IOVA_CAP_UMEM (1 << 0) 275 250 __u64 capability; 276 - __u64 reserved[3]; 251 + __u32 asid; /* Only if device API version >= 1 */ 252 + __u32 reserved[5]; 277 253 }; 278 254 279 255 /* ··· 282 256 * and return some information on it. Caller should set start and last fields. 283 257 */ 284 258 #define VDUSE_IOTLB_GET_INFO _IOWR(VDUSE_BASE, 0x1a, struct vduse_iova_info) 259 + 260 + /** 261 + * struct vduse_iotlb_entry_v2 - entry of IOTLB to describe one IOVA region 262 + * 263 + * @v1: the original vduse_iotlb_entry 264 + * @asid: address space ID of the IOVA region 265 + * @reserved: for future use, needs to be initialized to zero 266 + * 267 + * Structure used by VDUSE_IOTLB_GET_FD2 ioctl to find an overlapped IOVA region. 268 + */ 269 + struct vduse_iotlb_entry_v2 { 270 + __u64 offset; 271 + __u64 start; 272 + __u64 last; 273 + __u8 perm; 274 + __u8 padding[7]; 275 + __u32 asid; 276 + __u32 reserved[11]; 277 + }; 278 + 279 + /* 280 + * Same as VDUSE_IOTLB_GET_FD but with vduse_iotlb_entry_v2 argument that 281 + * support extra fields. 282 + */ 283 + #define VDUSE_IOTLB_GET_FD2 _IOWR(VDUSE_BASE, 0x1b, struct vduse_iotlb_entry_v2) 284 + 285 285 286 286 /* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */ 287 287 ··· 317 265 * @VDUSE_SET_STATUS: set the device status 318 266 * @VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping for 319 267 * specified IOVA range via VDUSE_IOTLB_GET_FD ioctl 268 + * @VDUSE_SET_VQ_GROUP_ASID: Notify userspace to update the address space of a 269 + * virtqueue group. 320 270 */ 321 271 enum vduse_req_type { 322 272 VDUSE_GET_VQ_STATE, 323 273 VDUSE_SET_STATUS, 324 274 VDUSE_UPDATE_IOTLB, 275 + VDUSE_SET_VQ_GROUP_ASID, 325 276 }; 326 277 327 278 /** ··· 360 305 }; 361 306 362 307 /** 308 + * struct vduse_iova_range_v2 - IOVA range [start, last] if API_VERSION >= 1 309 + * @start: start of the IOVA range 310 + * @last: last of the IOVA range 311 + * @asid: address space ID of the IOVA range 312 + */ 313 + struct vduse_iova_range_v2 { 314 + __u64 start; 315 + __u64 last; 316 + __u32 asid; 317 + __u32 padding; 318 + }; 319 + 320 + /** 363 321 * struct vduse_dev_request - control request 364 322 * @type: request type 365 323 * @request_id: request id ··· 380 312 * @vq_state: virtqueue state, only index field is available 381 313 * @s: device status 382 314 * @iova: IOVA range for updating 315 + * @iova_v2: IOVA range for updating if API_VERSION >= 1 316 + * @vq_group_asid: ASID of a virtqueue group 383 317 * @padding: padding 384 318 * 385 319 * Structure used by read(2) on /dev/vduse/$NAME. ··· 394 324 struct vduse_vq_state vq_state; 395 325 struct vduse_dev_status s; 396 326 struct vduse_iova_range iova; 327 + /* Following members but padding exist only if vduse api 328 + * version >= 1 329 + */ 330 + struct vduse_iova_range_v2 iova_v2; 331 + struct vduse_vq_group_asid vq_group_asid; 397 332 __u32 padding[32]; 398 333 }; 399 334 };
+1 -4
include/uapi/linux/virtio_ring.h
··· 31 31 * SUCH DAMAGE. 32 32 * 33 33 * Copyright Rusty Russell IBM Corporation 2007. */ 34 - #ifndef __KERNEL__ 35 - #include <stdint.h> 36 - #endif 37 34 #include <linux/types.h> 38 35 #include <linux/virtio_types.h> 39 36 ··· 199 202 vr->num = num; 200 203 vr->desc = p; 201 204 vr->avail = (struct vring_avail *)((char *)p + num * sizeof(struct vring_desc)); 202 - vr->used = (void *)(((uintptr_t)&vr->avail->ring[num] + sizeof(__virtio16) 205 + vr->used = (void *)(((unsigned long)&vr->avail->ring[num] + sizeof(__virtio16) 203 206 + align-1) & ~(align - 1)); 204 207 } 205 208
+23 -5
kernel/dma/debug.c
··· 63 63 * @sg_mapped_ents: 'mapped_ents' from dma_map_sg 64 64 * @paddr: physical start address of the mapping 65 65 * @map_err_type: track whether dma_mapping_error() was checked 66 + * @is_cache_clean: driver promises not to write to buffer while mapped 66 67 * @stack_len: number of backtrace entries in @stack_entries 67 68 * @stack_entries: stack of backtrace history 68 69 */ ··· 77 76 int sg_call_ents; 78 77 int sg_mapped_ents; 79 78 phys_addr_t paddr; 80 - enum map_err_types map_err_type; 79 + enum map_err_types map_err_type; 80 + bool is_cache_clean; 81 81 #ifdef CONFIG_STACKTRACE 82 82 unsigned int stack_len; 83 83 unsigned long stack_entries[DMA_DEBUG_STACKTRACE_ENTRIES]; ··· 474 472 return active_cacheline_set_overlap(cln, --overlap); 475 473 } 476 474 477 - static int active_cacheline_insert(struct dma_debug_entry *entry) 475 + static int active_cacheline_insert(struct dma_debug_entry *entry, 476 + bool *overlap_cache_clean) 478 477 { 479 478 phys_addr_t cln = to_cacheline_number(entry); 480 479 unsigned long flags; 481 480 int rc; 481 + 482 + *overlap_cache_clean = false; 482 483 483 484 /* If the device is not writing memory then we don't have any 484 485 * concerns about the cpu consuming stale data. This mitigates ··· 492 487 493 488 spin_lock_irqsave(&radix_lock, flags); 494 489 rc = radix_tree_insert(&dma_active_cacheline, cln, entry); 495 - if (rc == -EEXIST) 490 + if (rc == -EEXIST) { 491 + struct dma_debug_entry *existing; 492 + 496 493 active_cacheline_inc_overlap(cln); 494 + existing = radix_tree_lookup(&dma_active_cacheline, cln); 495 + /* A lookup failure here after we got -EEXIST is unexpected. */ 496 + WARN_ON(!existing); 497 + if (existing) 498 + *overlap_cache_clean = existing->is_cache_clean; 499 + } 497 500 spin_unlock_irqrestore(&radix_lock, flags); 498 501 499 502 return rc; ··· 596 583 */ 597 584 static void add_dma_entry(struct dma_debug_entry *entry, unsigned long attrs) 598 585 { 586 + bool overlap_cache_clean; 599 587 struct hash_bucket *bucket; 600 588 unsigned long flags; 601 589 int rc; 590 + 591 + entry->is_cache_clean = !!(attrs & DMA_ATTR_CPU_CACHE_CLEAN); 602 592 603 593 bucket = get_hash_bucket(entry, &flags); 604 594 hash_bucket_add(bucket, entry); 605 595 put_hash_bucket(bucket, flags); 606 596 607 - rc = active_cacheline_insert(entry); 597 + rc = active_cacheline_insert(entry, &overlap_cache_clean); 608 598 if (rc == -ENOMEM) { 609 599 pr_err_once("cacheline tracking ENOMEM, dma-debug disabled\n"); 610 600 global_disable = true; 611 - } else if (rc == -EEXIST && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && 601 + } else if (rc == -EEXIST && 602 + !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && 603 + !(entry->is_cache_clean && overlap_cache_clean) && 612 604 !(IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) && 613 605 is_swiotlb_active(entry->dev))) { 614 606 err_printk(entry->dev, entry,
+11 -8
net/vmw_vsock/virtio_transport.c
··· 17 17 #include <linux/virtio_ids.h> 18 18 #include <linux/virtio_config.h> 19 19 #include <linux/virtio_vsock.h> 20 + #include <linux/dma-mapping.h> 20 21 #include <net/sock.h> 21 22 #include <linux/mutex.h> 22 23 #include <net/af_vsock.h> ··· 55 54 int rx_buf_nr; 56 55 int rx_buf_max_nr; 57 56 58 - /* The following fields are protected by event_lock. 59 - * vqs[VSOCK_VQ_EVENT] must be accessed with event_lock held. 60 - */ 61 - struct mutex event_lock; 62 - bool event_run; 63 - struct virtio_vsock_event event_list[8]; 64 - 65 57 u32 guest_cid; 66 58 bool seqpacket_allow; 67 59 ··· 68 74 */ 69 75 struct scatterlist *out_sgs[MAX_SKB_FRAGS + 1]; 70 76 struct scatterlist out_bufs[MAX_SKB_FRAGS + 1]; 77 + 78 + /* The following fields are protected by event_lock. 79 + * vqs[VSOCK_VQ_EVENT] must be accessed with event_lock held. 80 + */ 81 + struct mutex event_lock; 82 + bool event_run; 83 + __dma_from_device_group_begin(); 84 + struct virtio_vsock_event event_list[8]; 85 + __dma_from_device_group_end(); 71 86 }; 72 87 73 88 static u32 virtio_transport_get_local_cid(void) ··· 393 390 394 391 sg_init_one(&sg, event, sizeof(*event)); 395 392 396 - return virtqueue_add_inbuf(vq, &sg, 1, event, GFP_KERNEL); 393 + return virtqueue_add_inbuf_cache_clean(vq, &sg, 1, event, GFP_KERNEL); 397 394 } 398 395 399 396 /* event_lock must be held */
+3 -1
scripts/checkpatch.pl
··· 1102 1102 (?:$Storage\s+)?(?:[A-Z_][A-Z0-9]*_){0,2}(?:DEFINE|DECLARE)(?:_[A-Z0-9]+){1,6}\s*\(| 1103 1103 (?:$Storage\s+)?[HLP]?LIST_HEAD\s*\(| 1104 1104 (?:SKCIPHER_REQUEST|SHASH_DESC|AHASH_REQUEST)_ON_STACK\s*\(| 1105 - (?:$Storage\s+)?(?:XA_STATE|XA_STATE_ORDER)\s*\( 1105 + (?:$Storage\s+)?(?:XA_STATE|XA_STATE_ORDER)\s*\(| 1106 + __cacheline_group_(?:begin|end)(?:_aligned)?\s*\(| 1107 + __dma_from_device_group_(?:begin|end)\s*\( 1106 1108 )}; 1107 1109 1108 1110 our %allow_repeated_words = (