Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

+1

Documentation/driver-api/index.rst

··· 108 108 vfio-mediated-device 109 109 vfio 110 110 vfio-pci-device-specific-driver-acceptance 111 + virtio/index 111 112 xilinx/index 112 113 xillybus 113 114 zorro

+11

Documentation/driver-api/virtio/index.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ====== 4 + Virtio 5 + ====== 6 + 7 + .. toctree:: 8 + :maxdepth: 1 9 + 10 + virtio 11 + writing_virtio_drivers

+145

Documentation/driver-api/virtio/virtio.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + .. _virtio: 4 + 5 + =============== 6 + Virtio on Linux 7 + =============== 8 + 9 + Introduction 10 + ============ 11 + 12 + Virtio is an open standard that defines a protocol for communication 13 + between drivers and devices of different types, see Chapter 5 ("Device 14 + Types") of the virtio spec (`[1]`_). Originally developed as a standard 15 + for paravirtualized devices implemented by a hypervisor, it can be used 16 + to interface any compliant device (real or emulated) with a driver. 17 + 18 + For illustrative purposes, this document will focus on the common case 19 + of a Linux kernel running in a virtual machine and using paravirtualized 20 + devices provided by the hypervisor, which exposes them as virtio devices 21 + via standard mechanisms such as PCI. 22 + 23 + 24 + Device - Driver communication: virtqueues 25 + ========================================= 26 + 27 + Although the virtio devices are really an abstraction layer in the 28 + hypervisor, they're exposed to the guest as if they are physical devices 29 + using a specific transport method -- PCI, MMIO or CCW -- that is 30 + orthogonal to the device itself. The virtio spec defines these transport 31 + methods in detail, including device discovery, capabilities and 32 + interrupt handling. 33 + 34 + The communication between the driver in the guest OS and the device in 35 + the hypervisor is done through shared memory (that's what makes virtio 36 + devices so efficient) using specialized data structures called 37 + virtqueues, which are actually ring buffers [#f1]_ of buffer descriptors 38 + similar to the ones used in a network device: 39 + 40 + .. kernel-doc:: include/uapi/linux/virtio_ring.h 41 + :identifiers: struct vring_desc 42 + 43 + All the buffers the descriptors point to are allocated by the guest and 44 + used by the host either for reading or for writing but not for both. 45 + 46 + Refer to Chapter 2.5 ("Virtqueues") of the virtio spec (`[1]`_) for the 47 + reference definitions of virtqueues and "Virtqueues and virtio ring: How 48 + the data travels" blog post (`[2]`_) for an illustrated overview of how 49 + the host device and the guest driver communicate. 50 + 51 + The :c:type:`vring_virtqueue` struct models a virtqueue, including the 52 + ring buffers and management data. Embedded in this struct is the 53 + :c:type:`virtqueue` struct, which is the data structure that's 54 + ultimately used by virtio drivers: 55 + 56 + .. kernel-doc:: include/linux/virtio.h 57 + :identifiers: struct virtqueue 58 + 59 + The callback function pointed by this struct is triggered when the 60 + device has consumed the buffers provided by the driver. More 61 + specifically, the trigger will be an interrupt issued by the hypervisor 62 + (see vring_interrupt()). Interrupt request handlers are registered for 63 + a virtqueue during the virtqueue setup process (transport-specific). 64 + 65 + .. kernel-doc:: drivers/virtio/virtio_ring.c 66 + :identifiers: vring_interrupt 67 + 68 + 69 + Device discovery and probing 70 + ============================ 71 + 72 + In the kernel, the virtio core contains the virtio bus driver and 73 + transport-specific drivers like `virtio-pci` and `virtio-mmio`. Then 74 + there are individual virtio drivers for specific device types that are 75 + registered to the virtio bus driver. 76 + 77 + How a virtio device is found and configured by the kernel depends on how 78 + the hypervisor defines it. Taking the `QEMU virtio-console 79 + <https://gitlab.com/qemu-project/qemu/-/blob/master/hw/char/virtio-console.c>`__ 80 + device as an example. When using PCI as a transport method, the device 81 + will present itself on the PCI bus with vendor 0x1af4 (Red Hat, Inc.) 82 + and device id 0x1003 (virtio console), as defined in the spec, so the 83 + kernel will detect it as it would do with any other PCI device. 84 + 85 + During the PCI enumeration process, if a device is found to match the 86 + virtio-pci driver (according to the virtio-pci device table, any PCI 87 + device with vendor id = 0x1af4):: 88 + 89 + /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */ 90 + static const struct pci_device_id virtio_pci_id_table[] = { 91 + { PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) }, 92 + { 0 } 93 + }; 94 + 95 + then the virtio-pci driver is probed and, if the probing goes well, the 96 + device is registered to the virtio bus:: 97 + 98 + static int virtio_pci_probe(struct pci_dev *pci_dev, 99 + const struct pci_device_id *id) 100 + { 101 + ... 102 + 103 + if (force_legacy) { 104 + rc = virtio_pci_legacy_probe(vp_dev); 105 + /* Also try modern mode if we can't map BAR0 (no IO space). */ 106 + if (rc == -ENODEV || rc == -ENOMEM) 107 + rc = virtio_pci_modern_probe(vp_dev); 108 + if (rc) 109 + goto err_probe; 110 + } else { 111 + rc = virtio_pci_modern_probe(vp_dev); 112 + if (rc == -ENODEV) 113 + rc = virtio_pci_legacy_probe(vp_dev); 114 + if (rc) 115 + goto err_probe; 116 + } 117 + 118 + ... 119 + 120 + rc = register_virtio_device(&vp_dev->vdev); 121 + 122 + When the device is registered to the virtio bus the kernel will look 123 + for a driver in the bus that can handle the device and call that 124 + driver's ``probe`` method. 125 + 126 + At this point, the virtqueues will be allocated and configured by 127 + calling the appropriate ``virtio_find`` helper function, such as 128 + virtio_find_single_vq() or virtio_find_vqs(), which will end up calling 129 + a transport-specific ``find_vqs`` method. 130 + 131 + 132 + References 133 + ========== 134 + 135 + _`[1]` Virtio Spec v1.2: 136 + https://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html 137 + 138 + .. Check for later versions of the spec as well. 139 + 140 + _`[2]` Virtqueues and virtio ring: How the data travels 141 + https://www.redhat.com/en/blog/virtqueues-and-virtio-ring-how-data-travels 142 + 143 + .. rubric:: Footnotes 144 + 145 + .. [#f1] that's why they may be also referred to as virtrings.

+197

Documentation/driver-api/virtio/writing_virtio_drivers.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + .. _writing_virtio_drivers: 4 + 5 + ====================== 6 + Writing Virtio Drivers 7 + ====================== 8 + 9 + Introduction 10 + ============ 11 + 12 + This document serves as a basic guideline for driver programmers that 13 + need to hack a new virtio driver or understand the essentials of the 14 + existing ones. See :ref:`Virtio on Linux <virtio>` for a general 15 + overview of virtio. 16 + 17 + 18 + Driver boilerplate 19 + ================== 20 + 21 + As a bare minimum, a virtio driver needs to register in the virtio bus 22 + and configure the virtqueues for the device according to its spec, the 23 + configuration of the virtqueues in the driver side must match the 24 + virtqueue definitions in the device. A basic driver skeleton could look 25 + like this:: 26 + 27 + #include <linux/virtio.h> 28 + #include <linux/virtio_ids.h> 29 + #include <linux/virtio_config.h> 30 + #include <linux/module.h> 31 + 32 + /* device private data (one per device) */ 33 + struct virtio_dummy_dev { 34 + struct virtqueue *vq; 35 + }; 36 + 37 + static void virtio_dummy_recv_cb(struct virtqueue *vq) 38 + { 39 + struct virtio_dummy_dev *dev = vq->vdev->priv; 40 + char *buf; 41 + unsigned int len; 42 + 43 + while ((buf = virtqueue_get_buf(dev->vq, &len)) != NULL) { 44 + /* process the received data */ 45 + } 46 + } 47 + 48 + static int virtio_dummy_probe(struct virtio_device *vdev) 49 + { 50 + struct virtio_dummy_dev *dev = NULL; 51 + 52 + /* initialize device data */ 53 + dev = kzalloc(sizeof(struct virtio_dummy_dev), GFP_KERNEL); 54 + if (!dev) 55 + return -ENOMEM; 56 + 57 + /* the device has a single virtqueue */ 58 + dev->vq = virtio_find_single_vq(vdev, virtio_dummy_recv_cb, "input"); 59 + if (IS_ERR(dev->vq)) { 60 + kfree(dev); 61 + return PTR_ERR(dev->vq); 62 + 63 + } 64 + vdev->priv = dev; 65 + 66 + /* from this point on, the device can notify and get callbacks */ 67 + virtio_device_ready(vdev); 68 + 69 + return 0; 70 + } 71 + 72 + static void virtio_dummy_remove(struct virtio_device *vdev) 73 + { 74 + struct virtio_dummy_dev *dev = vdev->priv; 75 + 76 + /* 77 + * disable vq interrupts: equivalent to 78 + * vdev->config->reset(vdev) 79 + */ 80 + virtio_reset_device(vdev); 81 + 82 + /* detach unused buffers */ 83 + while ((buf = virtqueue_detach_unused_buf(dev->vq)) != NULL) { 84 + kfree(buf); 85 + } 86 + 87 + /* remove virtqueues */ 88 + vdev->config->del_vqs(vdev); 89 + 90 + kfree(dev); 91 + } 92 + 93 + static const struct virtio_device_id id_table[] = { 94 + { VIRTIO_ID_DUMMY, VIRTIO_DEV_ANY_ID }, 95 + { 0 }, 96 + }; 97 + 98 + static struct virtio_driver virtio_dummy_driver = { 99 + .driver.name = KBUILD_MODNAME, 100 + .driver.owner = THIS_MODULE, 101 + .id_table = id_table, 102 + .probe = virtio_dummy_probe, 103 + .remove = virtio_dummy_remove, 104 + }; 105 + 106 + module_virtio_driver(virtio_dummy_driver); 107 + MODULE_DEVICE_TABLE(virtio, id_table); 108 + MODULE_DESCRIPTION("Dummy virtio driver"); 109 + MODULE_LICENSE("GPL"); 110 + 111 + The device id ``VIRTIO_ID_DUMMY`` here is a placeholder, virtio drivers 112 + should be added only for devices that are defined in the spec, see 113 + include/uapi/linux/virtio_ids.h. Device ids need to be at least reserved 114 + in the virtio spec before being added to that file. 115 + 116 + If your driver doesn't have to do anything special in its ``init`` and 117 + ``exit`` methods, you can use the module_virtio_driver() helper to 118 + reduce the amount of boilerplate code. 119 + 120 + The ``probe`` method does the minimum driver setup in this case 121 + (memory allocation for the device data) and initializes the 122 + virtqueue. virtio_device_ready() is used to enable the virtqueue and to 123 + notify the device that the driver is ready to manage the device 124 + ("DRIVER_OK"). The virtqueues are anyway enabled automatically by the 125 + core after ``probe`` returns. 126 + 127 + .. kernel-doc:: include/linux/virtio_config.h 128 + :identifiers: virtio_device_ready 129 + 130 + In any case, the virtqueues need to be enabled before adding buffers to 131 + them. 132 + 133 + Sending and receiving data 134 + ========================== 135 + 136 + The virtio_dummy_recv_cb() callback in the code above will be triggered 137 + when the device notifies the driver after it finishes processing a 138 + descriptor or descriptor chain, either for reading or writing. However, 139 + that's only the second half of the virtio device-driver communication 140 + process, as the communication is always started by the driver regardless 141 + of the direction of the data transfer. 142 + 143 + To configure a buffer transfer from the driver to the device, first you 144 + have to add the buffers -- packed as `scatterlists` -- to the 145 + appropriate virtqueue using any of the virtqueue_add_inbuf(), 146 + virtqueue_add_outbuf() or virtqueue_add_sgs(), depending on whether you 147 + need to add one input `scatterlist` (for the device to fill in), one 148 + output `scatterlist` (for the device to consume) or multiple 149 + `scatterlists`, respectively. Then, once the virtqueue is set up, a call 150 + to virtqueue_kick() sends a notification that will be serviced by the 151 + hypervisor that implements the device:: 152 + 153 + struct scatterlist sg[1]; 154 + sg_init_one(sg, buffer, BUFLEN); 155 + virtqueue_add_inbuf(dev->vq, sg, 1, buffer, GFP_ATOMIC); 156 + virtqueue_kick(dev->vq); 157 + 158 + .. kernel-doc:: drivers/virtio/virtio_ring.c 159 + :identifiers: virtqueue_add_inbuf 160 + 161 + .. kernel-doc:: drivers/virtio/virtio_ring.c 162 + :identifiers: virtqueue_add_outbuf 163 + 164 + .. kernel-doc:: drivers/virtio/virtio_ring.c 165 + :identifiers: virtqueue_add_sgs 166 + 167 + Then, after the device has read or written the buffers prepared by the 168 + driver and notifies it back, the driver can call virtqueue_get_buf() to 169 + read the data produced by the device (if the virtqueue was set up with 170 + input buffers) or simply to reclaim the buffers if they were already 171 + consumed by the device: 172 + 173 + .. kernel-doc:: drivers/virtio/virtio_ring.c 174 + :identifiers: virtqueue_get_buf_ctx 175 + 176 + The virtqueue callbacks can be disabled and re-enabled using the 177 + virtqueue_disable_cb() and the family of virtqueue_enable_cb() functions 178 + respectively. See drivers/virtio/virtio_ring.c for more details: 179 + 180 + .. kernel-doc:: drivers/virtio/virtio_ring.c 181 + :identifiers: virtqueue_disable_cb 182 + 183 + .. kernel-doc:: drivers/virtio/virtio_ring.c 184 + :identifiers: virtqueue_enable_cb 185 + 186 + But note that some spurious callbacks can still be triggered under 187 + certain scenarios. The way to disable callbacks reliably is to reset the 188 + device or the virtqueue (virtio_reset_device()). 189 + 190 + 191 + References 192 + ========== 193 + 194 + _`[1]` Virtio Spec v1.2: 195 + https://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html 196 + 197 + Check for later versions of the spec as well.

+5

MAINTAINERS

··· 22057 22057 F: Documentation/ABI/testing/sysfs-bus-vdpa 22058 22058 F: Documentation/ABI/testing/sysfs-class-vduse 22059 22059 F: Documentation/devicetree/bindings/virtio/ 22060 + F: Documentation/driver-api/virtio/ 22060 22061 F: drivers/block/virtio_blk.c 22061 22062 F: drivers/crypto/virtio/ 22062 22063 F: drivers/net/virtio_net.c ··· 22077 22076 IFCVF VIRTIO DATA PATH ACCELERATOR 22078 22077 R: Zhu Lingshan <lingshan.zhu@intel.com> 22079 22078 F: drivers/vdpa/ifcvf/ 22079 + 22080 + SNET DPU VIRTIO DATA PATH ACCELERATOR 22081 + R: Alvaro Karsz <alvaro.karsz@solid-run.com> 22082 + F: drivers/vdpa/solidrun/ 22080 22083 22081 22084 VIRTIO BALLOON 22082 22085 M: "Michael S. Tsirkin" <mst@redhat.com>

+414 -54

drivers/block/virtio_blk.c

··· 15 15 #include <linux/blk-mq.h> 16 16 #include <linux/blk-mq-virtio.h> 17 17 #include <linux/numa.h> 18 + #include <linux/vmalloc.h> 18 19 #include <uapi/linux/virtio_ring.h> 19 20 20 21 #define PART_BITS 4 ··· 81 80 int num_vqs; 82 81 int io_queues[HCTX_MAX_TYPES]; 83 82 struct virtio_blk_vq *vqs; 83 + 84 + /* For zoned device */ 85 + unsigned int zone_sectors; 84 86 }; 85 87 86 88 struct virtblk_req { 89 + /* Out header */ 87 90 struct virtio_blk_outhdr out_hdr; 88 - u8 status; 91 + 92 + /* In header */ 93 + union { 94 + u8 status; 95 + 96 + /* 97 + * The zone append command has an extended in header. 98 + * The status field in zone_append_in_hdr must have 99 + * the same offset in virtblk_req as the non-zoned 100 + * status field above. 101 + */ 102 + struct { 103 + u8 status; 104 + u8 reserved[7]; 105 + __le64 append_sector; 106 + } zone_append_in_hdr; 107 + }; 108 + 109 + size_t in_hdr_len; 110 + 89 111 struct sg_table sg_table; 90 112 struct scatterlist sg[]; 91 113 }; 92 114 93 - static inline blk_status_t virtblk_result(struct virtblk_req *vbr) 115 + static inline blk_status_t virtblk_result(u8 status) 94 116 { 95 - switch (vbr->status) { 117 + switch (status) { 96 118 case VIRTIO_BLK_S_OK: 97 119 return BLK_STS_OK; 98 120 case VIRTIO_BLK_S_UNSUPP: 99 121 return BLK_STS_NOTSUPP; 122 + case VIRTIO_BLK_S_ZONE_OPEN_RESOURCE: 123 + return BLK_STS_ZONE_OPEN_RESOURCE; 124 + case VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE: 125 + return BLK_STS_ZONE_ACTIVE_RESOURCE; 126 + case VIRTIO_BLK_S_IOERR: 127 + case VIRTIO_BLK_S_ZONE_UNALIGNED_WP: 100 128 default: 101 129 return BLK_STS_IOERR; 102 130 } ··· 141 111 142 112 static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr) 143 113 { 144 - struct scatterlist hdr, status, *sgs[3]; 114 + struct scatterlist out_hdr, in_hdr, *sgs[3]; 145 115 unsigned int num_out = 0, num_in = 0; 146 116 147 - sg_init_one(&hdr, &vbr->out_hdr, sizeof(vbr->out_hdr)); 148 - sgs[num_out++] = &hdr; 117 + sg_init_one(&out_hdr, &vbr->out_hdr, sizeof(vbr->out_hdr)); 118 + sgs[num_out++] = &out_hdr; 149 119 150 120 if (vbr->sg_table.nents) { 151 121 if (vbr->out_hdr.type & cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_OUT)) ··· 154 124 sgs[num_out + num_in++] = vbr->sg_table.sgl; 155 125 } 156 126 157 - sg_init_one(&status, &vbr->status, sizeof(vbr->status)); 158 - sgs[num_out + num_in++] = &status; 127 + sg_init_one(&in_hdr, &vbr->status, vbr->in_hdr_len); 128 + sgs[num_out + num_in++] = &in_hdr; 159 129 160 130 return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC); 161 131 } ··· 242 212 struct request *req, 243 213 struct virtblk_req *vbr) 244 214 { 215 + size_t in_hdr_len = sizeof(vbr->status); 245 216 bool unmap = false; 246 217 u32 type; 218 + u64 sector = 0; 247 219 248 - vbr->out_hdr.sector = 0; 220 + /* Set fields for all request types */ 221 + vbr->out_hdr.ioprio = cpu_to_virtio32(vdev, req_get_ioprio(req)); 249 222 250 223 switch (req_op(req)) { 251 224 case REQ_OP_READ: 252 225 type = VIRTIO_BLK_T_IN; 253 - vbr->out_hdr.sector = cpu_to_virtio64(vdev, 254 - blk_rq_pos(req)); 226 + sector = blk_rq_pos(req); 255 227 break; 256 228 case REQ_OP_WRITE: 257 229 type = VIRTIO_BLK_T_OUT; 258 - vbr->out_hdr.sector = cpu_to_virtio64(vdev, 259 - blk_rq_pos(req)); 230 + sector = blk_rq_pos(req); 260 231 break; 261 232 case REQ_OP_FLUSH: 262 233 type = VIRTIO_BLK_T_FLUSH; ··· 272 241 case REQ_OP_SECURE_ERASE: 273 242 type = VIRTIO_BLK_T_SECURE_ERASE; 274 243 break; 275 - case REQ_OP_DRV_IN: 276 - type = VIRTIO_BLK_T_GET_ID; 244 + case REQ_OP_ZONE_OPEN: 245 + type = VIRTIO_BLK_T_ZONE_OPEN; 246 + sector = blk_rq_pos(req); 277 247 break; 248 + case REQ_OP_ZONE_CLOSE: 249 + type = VIRTIO_BLK_T_ZONE_CLOSE; 250 + sector = blk_rq_pos(req); 251 + break; 252 + case REQ_OP_ZONE_FINISH: 253 + type = VIRTIO_BLK_T_ZONE_FINISH; 254 + sector = blk_rq_pos(req); 255 + break; 256 + case REQ_OP_ZONE_APPEND: 257 + type = VIRTIO_BLK_T_ZONE_APPEND; 258 + sector = blk_rq_pos(req); 259 + in_hdr_len = sizeof(vbr->zone_append_in_hdr); 260 + break; 261 + case REQ_OP_ZONE_RESET: 262 + type = VIRTIO_BLK_T_ZONE_RESET; 263 + sector = blk_rq_pos(req); 264 + break; 265 + case REQ_OP_ZONE_RESET_ALL: 266 + type = VIRTIO_BLK_T_ZONE_RESET_ALL; 267 + break; 268 + case REQ_OP_DRV_IN: 269 + /* Out header already filled in, nothing to do */ 270 + return 0; 278 271 default: 279 272 WARN_ON_ONCE(1); 280 273 return BLK_STS_IOERR; 281 274 } 282 275 276 + /* Set fields for non-REQ_OP_DRV_IN request types */ 277 + vbr->in_hdr_len = in_hdr_len; 283 278 vbr->out_hdr.type = cpu_to_virtio32(vdev, type); 284 - vbr->out_hdr.ioprio = cpu_to_virtio32(vdev, req_get_ioprio(req)); 279 + vbr->out_hdr.sector = cpu_to_virtio64(vdev, sector); 285 280 286 281 if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES || 287 282 type == VIRTIO_BLK_T_SECURE_ERASE) { ··· 321 264 static inline void virtblk_request_done(struct request *req) 322 265 { 323 266 struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); 267 + blk_status_t status = virtblk_result(vbr->status); 324 268 325 269 virtblk_unmap_data(req, vbr); 326 270 virtblk_cleanup_cmd(req); 327 - blk_mq_end_request(req, virtblk_result(vbr)); 271 + 272 + if (req_op(req) == REQ_OP_ZONE_APPEND) 273 + req->__sector = le64_to_cpu(vbr->zone_append_in_hdr.append_sector); 274 + 275 + blk_mq_end_request(req, status); 276 + } 277 + 278 + static void virtblk_complete_batch(struct io_comp_batch *iob) 279 + { 280 + struct request *req; 281 + 282 + rq_list_for_each(&iob->req_list, req) { 283 + virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); 284 + virtblk_cleanup_cmd(req); 285 + } 286 + blk_mq_end_request_batch(iob); 287 + } 288 + 289 + static int virtblk_handle_req(struct virtio_blk_vq *vq, 290 + struct io_comp_batch *iob) 291 + { 292 + struct virtblk_req *vbr; 293 + int req_done = 0; 294 + unsigned int len; 295 + 296 + while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { 297 + struct request *req = blk_mq_rq_from_pdu(vbr); 298 + 299 + if (likely(!blk_should_fake_timeout(req->q)) && 300 + !blk_mq_complete_request_remote(req) && 301 + !blk_mq_add_to_batch(req, iob, vbr->status, 302 + virtblk_complete_batch)) 303 + virtblk_request_done(req); 304 + req_done++; 305 + } 306 + 307 + return req_done; 328 308 } 329 309 330 310 static void virtblk_done(struct virtqueue *vq) 331 311 { 332 312 struct virtio_blk *vblk = vq->vdev->priv; 333 - bool req_done = false; 334 - int qid = vq->index; 335 - struct virtblk_req *vbr; 313 + struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index]; 314 + int req_done = 0; 336 315 unsigned long flags; 337 - unsigned int len; 316 + DEFINE_IO_COMP_BATCH(iob); 338 317 339 - spin_lock_irqsave(&vblk->vqs[qid].lock, flags); 318 + spin_lock_irqsave(&vblk_vq->lock, flags); 340 319 do { 341 320 virtqueue_disable_cb(vq); 342 - while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) { 343 - struct request *req = blk_mq_rq_from_pdu(vbr); 321 + req_done += virtblk_handle_req(vblk_vq, &iob); 344 322 345 - if (likely(!blk_should_fake_timeout(req->q))) 346 - blk_mq_complete_request(req); 347 - req_done = true; 348 - } 349 323 if (unlikely(virtqueue_is_broken(vq))) 350 324 break; 351 325 } while (!virtqueue_enable_cb(vq)); 352 326 353 - /* In case queue is stopped waiting for more buffers. */ 354 - if (req_done) 327 + if (req_done) { 328 + if (!rq_list_empty(iob.req_list)) 329 + iob.complete(&iob); 330 + 331 + /* In case queue is stopped waiting for more buffers. */ 355 332 blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); 356 - spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags); 333 + } 334 + spin_unlock_irqrestore(&vblk_vq->lock, flags); 357 335 } 358 336 359 337 static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx) ··· 547 455 *rqlist = requeue_list; 548 456 } 549 457 458 + #ifdef CONFIG_BLK_DEV_ZONED 459 + static void *virtblk_alloc_report_buffer(struct virtio_blk *vblk, 460 + unsigned int nr_zones, 461 + unsigned int zone_sectors, 462 + size_t *buflen) 463 + { 464 + struct request_queue *q = vblk->disk->queue; 465 + size_t bufsize; 466 + void *buf; 467 + 468 + nr_zones = min_t(unsigned int, nr_zones, 469 + get_capacity(vblk->disk) >> ilog2(zone_sectors)); 470 + 471 + bufsize = sizeof(struct virtio_blk_zone_report) + 472 + nr_zones * sizeof(struct virtio_blk_zone_descriptor); 473 + bufsize = min_t(size_t, bufsize, 474 + queue_max_hw_sectors(q) << SECTOR_SHIFT); 475 + bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT); 476 + 477 + while (bufsize >= sizeof(struct virtio_blk_zone_report)) { 478 + buf = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY); 479 + if (buf) { 480 + *buflen = bufsize; 481 + return buf; 482 + } 483 + bufsize >>= 1; 484 + } 485 + 486 + return NULL; 487 + } 488 + 489 + static int virtblk_submit_zone_report(struct virtio_blk *vblk, 490 + char *report_buf, size_t report_len, 491 + sector_t sector) 492 + { 493 + struct request_queue *q = vblk->disk->queue; 494 + struct request *req; 495 + struct virtblk_req *vbr; 496 + int err; 497 + 498 + req = blk_mq_alloc_request(q, REQ_OP_DRV_IN, 0); 499 + if (IS_ERR(req)) 500 + return PTR_ERR(req); 501 + 502 + vbr = blk_mq_rq_to_pdu(req); 503 + vbr->in_hdr_len = sizeof(vbr->status); 504 + vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_ZONE_REPORT); 505 + vbr->out_hdr.sector = cpu_to_virtio64(vblk->vdev, sector); 506 + 507 + err = blk_rq_map_kern(q, req, report_buf, report_len, GFP_KERNEL); 508 + if (err) 509 + goto out; 510 + 511 + blk_execute_rq(req, false); 512 + err = blk_status_to_errno(virtblk_result(vbr->status)); 513 + out: 514 + blk_mq_free_request(req); 515 + return err; 516 + } 517 + 518 + static int virtblk_parse_zone(struct virtio_blk *vblk, 519 + struct virtio_blk_zone_descriptor *entry, 520 + unsigned int idx, unsigned int zone_sectors, 521 + report_zones_cb cb, void *data) 522 + { 523 + struct blk_zone zone = { }; 524 + 525 + if (entry->z_type != VIRTIO_BLK_ZT_SWR && 526 + entry->z_type != VIRTIO_BLK_ZT_SWP && 527 + entry->z_type != VIRTIO_BLK_ZT_CONV) { 528 + dev_err(&vblk->vdev->dev, "invalid zone type %#x\n", 529 + entry->z_type); 530 + return -EINVAL; 531 + } 532 + 533 + zone.type = entry->z_type; 534 + zone.cond = entry->z_state; 535 + zone.len = zone_sectors; 536 + zone.capacity = le64_to_cpu(entry->z_cap); 537 + zone.start = le64_to_cpu(entry->z_start); 538 + if (zone.cond == BLK_ZONE_COND_FULL) 539 + zone.wp = zone.start + zone.len; 540 + else 541 + zone.wp = le64_to_cpu(entry->z_wp); 542 + 543 + return cb(&zone, idx, data); 544 + } 545 + 546 + static int virtblk_report_zones(struct gendisk *disk, sector_t sector, 547 + unsigned int nr_zones, report_zones_cb cb, 548 + void *data) 549 + { 550 + struct virtio_blk *vblk = disk->private_data; 551 + struct virtio_blk_zone_report *report; 552 + unsigned int zone_sectors = vblk->zone_sectors; 553 + unsigned int nz, i; 554 + int ret, zone_idx = 0; 555 + size_t buflen; 556 + 557 + if (WARN_ON_ONCE(!vblk->zone_sectors)) 558 + return -EOPNOTSUPP; 559 + 560 + report = virtblk_alloc_report_buffer(vblk, nr_zones, 561 + zone_sectors, &buflen); 562 + if (!report) 563 + return -ENOMEM; 564 + 565 + while (zone_idx < nr_zones && sector < get_capacity(vblk->disk)) { 566 + memset(report, 0, buflen); 567 + 568 + ret = virtblk_submit_zone_report(vblk, (char *)report, 569 + buflen, sector); 570 + if (ret) { 571 + if (ret > 0) 572 + ret = -EIO; 573 + goto out_free; 574 + } 575 + nz = min((unsigned int)le64_to_cpu(report->nr_zones), nr_zones); 576 + if (!nz) 577 + break; 578 + 579 + for (i = 0; i < nz && zone_idx < nr_zones; i++) { 580 + ret = virtblk_parse_zone(vblk, &report->zones[i], 581 + zone_idx, zone_sectors, cb, data); 582 + if (ret) 583 + goto out_free; 584 + sector = le64_to_cpu(report->zones[i].z_start) + zone_sectors; 585 + zone_idx++; 586 + } 587 + } 588 + 589 + if (zone_idx > 0) 590 + ret = zone_idx; 591 + else 592 + ret = -EINVAL; 593 + out_free: 594 + kvfree(report); 595 + return ret; 596 + } 597 + 598 + static void virtblk_revalidate_zones(struct virtio_blk *vblk) 599 + { 600 + u8 model; 601 + 602 + if (!vblk->zone_sectors) 603 + return; 604 + 605 + virtio_cread(vblk->vdev, struct virtio_blk_config, 606 + zoned.model, &model); 607 + if (!blk_revalidate_disk_zones(vblk->disk, NULL)) 608 + set_capacity_and_notify(vblk->disk, 0); 609 + } 610 + 611 + static int virtblk_probe_zoned_device(struct virtio_device *vdev, 612 + struct virtio_blk *vblk, 613 + struct request_queue *q) 614 + { 615 + u32 v; 616 + u8 model; 617 + int ret; 618 + 619 + virtio_cread(vdev, struct virtio_blk_config, 620 + zoned.model, &model); 621 + 622 + switch (model) { 623 + case VIRTIO_BLK_Z_NONE: 624 + return 0; 625 + case VIRTIO_BLK_Z_HM: 626 + break; 627 + case VIRTIO_BLK_Z_HA: 628 + /* 629 + * Present the host-aware device as a regular drive. 630 + * TODO It is possible to add an option to make it appear 631 + * in the system as a zoned drive. 632 + */ 633 + return 0; 634 + default: 635 + dev_err(&vdev->dev, "unsupported zone model %d\n", model); 636 + return -EINVAL; 637 + } 638 + 639 + dev_dbg(&vdev->dev, "probing host-managed zoned device\n"); 640 + 641 + disk_set_zoned(vblk->disk, BLK_ZONED_HM); 642 + blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q); 643 + 644 + virtio_cread(vdev, struct virtio_blk_config, 645 + zoned.max_open_zones, &v); 646 + disk_set_max_open_zones(vblk->disk, le32_to_cpu(v)); 647 + 648 + dev_dbg(&vdev->dev, "max open zones = %u\n", le32_to_cpu(v)); 649 + 650 + virtio_cread(vdev, struct virtio_blk_config, 651 + zoned.max_active_zones, &v); 652 + disk_set_max_active_zones(vblk->disk, le32_to_cpu(v)); 653 + dev_dbg(&vdev->dev, "max active zones = %u\n", le32_to_cpu(v)); 654 + 655 + virtio_cread(vdev, struct virtio_blk_config, 656 + zoned.write_granularity, &v); 657 + if (!v) { 658 + dev_warn(&vdev->dev, "zero write granularity reported\n"); 659 + return -ENODEV; 660 + } 661 + blk_queue_physical_block_size(q, le32_to_cpu(v)); 662 + blk_queue_io_min(q, le32_to_cpu(v)); 663 + 664 + dev_dbg(&vdev->dev, "write granularity = %u\n", le32_to_cpu(v)); 665 + 666 + /* 667 + * virtio ZBD specification doesn't require zones to be a power of 668 + * two sectors in size, but the code in this driver expects that. 669 + */ 670 + virtio_cread(vdev, struct virtio_blk_config, zoned.zone_sectors, &v); 671 + vblk->zone_sectors = le32_to_cpu(v); 672 + if (vblk->zone_sectors == 0 || !is_power_of_2(vblk->zone_sectors)) { 673 + dev_err(&vdev->dev, 674 + "zoned device with non power of two zone size %u\n", 675 + vblk->zone_sectors); 676 + return -ENODEV; 677 + } 678 + dev_dbg(&vdev->dev, "zone sectors = %u\n", vblk->zone_sectors); 679 + 680 + if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) { 681 + dev_warn(&vblk->vdev->dev, 682 + "ignoring negotiated F_DISCARD for zoned device\n"); 683 + blk_queue_max_discard_sectors(q, 0); 684 + } 685 + 686 + ret = blk_revalidate_disk_zones(vblk->disk, NULL); 687 + if (!ret) { 688 + virtio_cread(vdev, struct virtio_blk_config, 689 + zoned.max_append_sectors, &v); 690 + if (!v) { 691 + dev_warn(&vdev->dev, "zero max_append_sectors reported\n"); 692 + return -ENODEV; 693 + } 694 + blk_queue_max_zone_append_sectors(q, le32_to_cpu(v)); 695 + dev_dbg(&vdev->dev, "max append sectors = %u\n", le32_to_cpu(v)); 696 + } 697 + 698 + return ret; 699 + } 700 + 701 + static inline bool virtblk_has_zoned_feature(struct virtio_device *vdev) 702 + { 703 + return virtio_has_feature(vdev, VIRTIO_BLK_F_ZONED); 704 + } 705 + #else 706 + 707 + /* 708 + * Zoned block device support is not configured in this kernel. 709 + * We only need to define a few symbols to avoid compilation errors. 710 + */ 711 + #define virtblk_report_zones NULL 712 + static inline void virtblk_revalidate_zones(struct virtio_blk *vblk) 713 + { 714 + } 715 + static inline int virtblk_probe_zoned_device(struct virtio_device *vdev, 716 + struct virtio_blk *vblk, struct request_queue *q) 717 + { 718 + return -EOPNOTSUPP; 719 + } 720 + 721 + static inline bool virtblk_has_zoned_feature(struct virtio_device *vdev) 722 + { 723 + return false; 724 + } 725 + #endif /* CONFIG_BLK_DEV_ZONED */ 726 + 550 727 /* return id (s/n) string for *disk to *id_str 551 728 */ 552 729 static int virtblk_get_id(struct gendisk *disk, char *id_str) ··· 823 462 struct virtio_blk *vblk = disk->private_data; 824 463 struct request_queue *q = vblk->disk->queue; 825 464 struct request *req; 465 + struct virtblk_req *vbr; 826 466 int err; 827 467 828 468 req = blk_mq_alloc_request(q, REQ_OP_DRV_IN, 0); 829 469 if (IS_ERR(req)) 830 470 return PTR_ERR(req); 831 471 472 + vbr = blk_mq_rq_to_pdu(req); 473 + vbr->in_hdr_len = sizeof(vbr->status); 474 + vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_GET_ID); 475 + vbr->out_hdr.sector = 0; 476 + 832 477 err = blk_rq_map_kern(q, req, id_str, VIRTIO_BLK_ID_BYTES, GFP_KERNEL); 833 478 if (err) 834 479 goto out; 835 480 836 481 blk_execute_rq(req, false); 837 - err = blk_status_to_errno(virtblk_result(blk_mq_rq_to_pdu(req))); 482 + err = blk_status_to_errno(virtblk_result(vbr->status)); 838 483 out: 839 484 blk_mq_free_request(req); 840 485 return err; ··· 891 524 .owner = THIS_MODULE, 892 525 .getgeo = virtblk_getgeo, 893 526 .free_disk = virtblk_free_disk, 527 + .report_zones = virtblk_report_zones, 894 528 }; 895 529 896 530 static int index_to_minor(int index) ··· 962 594 struct virtio_blk *vblk = 963 595 container_of(work, struct virtio_blk, config_work); 964 596 597 + virtblk_revalidate_zones(vblk); 965 598 virtblk_update_capacity(vblk, true); 966 599 } 967 600 ··· 1204 835 } 1205 836 } 1206 837 1207 - static void virtblk_complete_batch(struct io_comp_batch *iob) 1208 - { 1209 - struct request *req; 1210 - 1211 - rq_list_for_each(&iob->req_list, req) { 1212 - virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); 1213 - virtblk_cleanup_cmd(req); 1214 - } 1215 - blk_mq_end_request_batch(iob); 1216 - } 1217 - 1218 838 static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob) 1219 839 { 1220 840 struct virtio_blk *vblk = hctx->queue->queuedata; 1221 841 struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx); 1222 - struct virtblk_req *vbr; 1223 842 unsigned long flags; 1224 - unsigned int len; 1225 843 int found = 0; 1226 844 1227 845 spin_lock_irqsave(&vq->lock, flags); 1228 - 1229 - while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { 1230 - struct request *req = blk_mq_rq_from_pdu(vbr); 1231 - 1232 - found++; 1233 - if (!blk_mq_add_to_batch(req, iob, vbr->status, 1234 - virtblk_complete_batch)) 1235 - blk_mq_complete_request(req); 1236 - } 846 + found = virtblk_handle_req(vq, iob); 1237 847 1238 848 if (found) 1239 849 blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); ··· 1498 1150 virtblk_update_capacity(vblk, false); 1499 1151 virtio_device_ready(vdev); 1500 1152 1153 + if (virtblk_has_zoned_feature(vdev)) { 1154 + err = virtblk_probe_zoned_device(vdev, vblk, q); 1155 + if (err) 1156 + goto out_cleanup_disk; 1157 + } 1158 + 1159 + dev_info(&vdev->dev, "blk config size: %zu\n", 1160 + sizeof(struct virtio_blk_config)); 1161 + 1501 1162 err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups); 1502 1163 if (err) 1503 1164 goto out_cleanup_disk; ··· 1608 1251 VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, 1609 1252 VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, 1610 1253 VIRTIO_BLK_F_SECURE_ERASE, 1254 + #ifdef CONFIG_BLK_DEV_ZONED 1255 + VIRTIO_BLK_F_ZONED, 1256 + #endif /* CONFIG_BLK_DEV_ZONED */ 1611 1257 }; 1612 1258 1613 1259 static struct virtio_driver virtio_blk = {

+9 -2

drivers/nvdimm/virtio_pmem.c

··· 32 32 static int virtio_pmem_probe(struct virtio_device *vdev) 33 33 { 34 34 struct nd_region_desc ndr_desc = {}; 35 - int nid = dev_to_node(&vdev->dev); 36 35 struct nd_region *nd_region; 37 36 struct virtio_pmem *vpmem; 38 37 struct resource res; ··· 78 79 dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus); 79 80 80 81 ndr_desc.res = &res; 81 - ndr_desc.numa_node = nid; 82 + 83 + ndr_desc.numa_node = memory_add_physaddr_to_nid(res.start); 84 + ndr_desc.target_node = phys_to_target_node(res.start); 85 + if (ndr_desc.target_node == NUMA_NO_NODE) { 86 + ndr_desc.target_node = ndr_desc.numa_node; 87 + dev_dbg(&vdev->dev, "changing target node from %d to %d", 88 + NUMA_NO_NODE, ndr_desc.target_node); 89 + } 90 + 82 91 ndr_desc.flush = async_pmem_flush; 83 92 ndr_desc.provider_data = vdev; 84 93 set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);

+8

drivers/pci/quirks.c

··· 5366 5366 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_no_flr); 5367 5367 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_no_flr); 5368 5368 5369 + /* FLR may cause the SolidRun SNET DPU (rev 0x1) to hang */ 5370 + static void quirk_no_flr_snet(struct pci_dev *dev) 5371 + { 5372 + if (dev->revision == 0x1) 5373 + quirk_no_flr(dev); 5374 + } 5375 + DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SOLIDRUN, 0x1000, quirk_no_flr_snet); 5376 + 5369 5377 static void quirk_no_ext_tags(struct pci_dev *pdev) 5370 5378 { 5371 5379 struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);

+12 -2

drivers/scsi/virtio_scsi.c

··· 330 330 scsi_device_put(sdev); 331 331 } 332 332 333 - static void virtscsi_rescan_hotunplug(struct virtio_scsi *vscsi) 333 + static int virtscsi_rescan_hotunplug(struct virtio_scsi *vscsi) 334 334 { 335 335 struct scsi_device *sdev; 336 336 struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev); 337 337 unsigned char scsi_cmd[MAX_COMMAND_SIZE]; 338 338 int result, inquiry_len, inq_result_len = 256; 339 339 char *inq_result = kmalloc(inq_result_len, GFP_KERNEL); 340 + 341 + if (!inq_result) { 342 + kfree(inq_result); 343 + return -ENOMEM; 344 + } 340 345 341 346 shost_for_each_device(sdev, shost) { 342 347 inquiry_len = sdev->inquiry_len ? sdev->inquiry_len : 36; ··· 371 366 } 372 367 373 368 kfree(inq_result); 369 + return 0; 374 370 } 375 371 376 372 static void virtscsi_handle_event(struct work_struct *work) ··· 383 377 384 378 if (event->event & 385 379 cpu_to_virtio32(vscsi->vdev, VIRTIO_SCSI_T_EVENTS_MISSED)) { 380 + int ret; 381 + 386 382 event->event &= ~cpu_to_virtio32(vscsi->vdev, 387 383 VIRTIO_SCSI_T_EVENTS_MISSED); 388 - virtscsi_rescan_hotunplug(vscsi); 384 + ret = virtscsi_rescan_hotunplug(vscsi); 385 + if (ret) 386 + return; 389 387 scsi_scan_host(virtio_scsi_host(vscsi->vdev)); 390 388 } 391 389

+30

drivers/vdpa/Kconfig

··· 71 71 be executed by the hardware. It also supports a variety of stateless 72 72 offloads depending on the actual device used and firmware version. 73 73 74 + config MLX5_VDPA_STEERING_DEBUG 75 + bool "expose steering counters on debugfs" 76 + select MLX5_VDPA 77 + help 78 + Expose RX steering counters in debugfs to aid in debugging. For each VLAN 79 + or non VLAN interface, two hardware counters are added to the RX flow 80 + table: one for unicast and one for multicast. 81 + The counters counts the number of packets and bytes and exposes them in 82 + debugfs. Once can read the counters using, e.g.: 83 + cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/ucast/packets 84 + cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/mcast/bytes 85 + 74 86 config VP_VDPA 75 87 tristate "Virtio PCI bridge vDPA driver" 76 88 select VIRTIO_PCI_LIB ··· 97 85 help 98 86 VDPA driver for Alibaba ENI (Elastic Network Interface) which is built upon 99 87 virtio 0.9.5 specification. 88 + 89 + config SNET_VDPA 90 + tristate "SolidRun's vDPA driver for SolidNET" 91 + depends on PCI_MSI && PCI_IOV && (HWMON || HWMON=n) 92 + 93 + # This driver MAY create a HWMON device. 94 + # Depending on (HWMON || HWMON=n) ensures that: 95 + # If HWMON=n the driver can be compiled either as a module or built-in. 96 + # If HWMON=y the driver can be compiled either as a module or built-in. 97 + # If HWMON=m the driver is forced to be compiled as a module. 98 + # By doing so, IS_ENABLED can be used instead of IS_REACHABLE 99 + 100 + help 101 + vDPA driver for SolidNET DPU. 102 + With this driver, the VirtIO dataplane can be 103 + offloaded to a SolidNET DPU. 104 + This driver includes a HW monitor device that 105 + reads health values from the DPU. 100 106 101 107 endif # VDPA

+1

drivers/vdpa/Makefile

··· 6 6 obj-$(CONFIG_MLX5_VDPA) += mlx5/ 7 7 obj-$(CONFIG_VP_VDPA) += virtio_pci/ 8 8 obj-$(CONFIG_ALIBABA_ENI_VDPA) += alibaba/ 9 + obj-$(CONFIG_SNET_VDPA) += solidrun/

+8 -24

drivers/vdpa/ifcvf/ifcvf_base.c

··· 10 10 11 11 #include "ifcvf_base.h" 12 12 13 - struct ifcvf_adapter *vf_to_adapter(struct ifcvf_hw *hw) 14 - { 15 - return container_of(hw, struct ifcvf_adapter, vf); 16 - } 17 - 18 13 u16 ifcvf_set_vq_vector(struct ifcvf_hw *hw, u16 qid, int vector) 19 14 { 20 15 struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg; ··· 32 37 static void __iomem *get_cap_addr(struct ifcvf_hw *hw, 33 38 struct virtio_pci_cap *cap) 34 39 { 35 - struct ifcvf_adapter *ifcvf; 36 - struct pci_dev *pdev; 37 40 u32 length, offset; 38 41 u8 bar; 39 42 ··· 39 46 offset = le32_to_cpu(cap->offset); 40 47 bar = cap->bar; 41 48 42 - ifcvf= vf_to_adapter(hw); 43 - pdev = ifcvf->pdev; 44 - 45 49 if (bar >= IFCVF_PCI_MAX_RESOURCE) { 46 - IFCVF_DBG(pdev, 50 + IFCVF_DBG(hw->pdev, 47 51 "Invalid bar number %u to get capabilities\n", bar); 48 52 return NULL; 49 53 } 50 54 51 - if (offset + length > pci_resource_len(pdev, bar)) { 52 - IFCVF_DBG(pdev, 55 + if (offset + length > pci_resource_len(hw->pdev, bar)) { 56 + IFCVF_DBG(hw->pdev, 53 57 "offset(%u) + len(%u) overflows bar%u's capability\n", 54 58 offset, length, bar); 55 59 return NULL; ··· 82 92 IFCVF_ERR(pdev, "Failed to read PCI capability list\n"); 83 93 return -EIO; 84 94 } 95 + hw->pdev = pdev; 85 96 86 97 while (pos) { 87 98 ret = ifcvf_read_config_range(pdev, (u32 *)&cap, ··· 206 215 207 216 u64 ifcvf_get_features(struct ifcvf_hw *hw) 208 217 { 209 - return hw->hw_features; 218 + return hw->dev_features; 210 219 } 211 220 212 221 int ifcvf_verify_min_features(struct ifcvf_hw *hw, u64 features) 213 222 { 214 - struct ifcvf_adapter *ifcvf = vf_to_adapter(hw); 215 - 216 223 if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)) && features) { 217 - IFCVF_ERR(ifcvf->pdev, "VIRTIO_F_ACCESS_PLATFORM is not negotiated\n"); 224 + IFCVF_ERR(hw->pdev, "VIRTIO_F_ACCESS_PLATFORM is not negotiated\n"); 218 225 return -EINVAL; 219 226 } 220 227 ··· 221 232 222 233 u32 ifcvf_get_config_size(struct ifcvf_hw *hw) 223 234 { 224 - struct ifcvf_adapter *adapter; 225 235 u32 net_config_size = sizeof(struct virtio_net_config); 226 236 u32 blk_config_size = sizeof(struct virtio_blk_config); 227 237 u32 cap_size = hw->cap_dev_config_size; 228 238 u32 config_size; 229 239 230 - adapter = vf_to_adapter(hw); 231 240 /* If the onboard device config space size is greater than 232 241 * the size of struct virtio_net/blk_config, only the spec 233 242 * implementing contents size is returned, this is very ··· 240 253 break; 241 254 default: 242 255 config_size = 0; 243 - IFCVF_ERR(adapter->pdev, "VIRTIO ID %u not supported\n", hw->dev_type); 256 + IFCVF_ERR(hw->pdev, "VIRTIO ID %u not supported\n", hw->dev_type); 244 257 } 245 258 246 259 return config_size; ··· 288 301 289 302 static int ifcvf_config_features(struct ifcvf_hw *hw) 290 303 { 291 - struct ifcvf_adapter *ifcvf; 292 - 293 - ifcvf = vf_to_adapter(hw); 294 304 ifcvf_set_features(hw, hw->req_features); 295 305 ifcvf_add_status(hw, VIRTIO_CONFIG_S_FEATURES_OK); 296 306 297 307 if (!(ifcvf_get_status(hw) & VIRTIO_CONFIG_S_FEATURES_OK)) { 298 - IFCVF_ERR(ifcvf->pdev, "Failed to set FEATURES_OK status\n"); 308 + IFCVF_ERR(hw->pdev, "Failed to set FEATURES_OK status\n"); 299 309 return -EIO; 300 310 } 301 311

+6 -4

drivers/vdpa/ifcvf/ifcvf_base.h

··· 19 19 #include <uapi/linux/virtio_blk.h> 20 20 #include <uapi/linux/virtio_config.h> 21 21 #include <uapi/linux/virtio_pci.h> 22 + #include <uapi/linux/vdpa.h> 22 23 23 24 #define N3000_DEVICE_ID 0x1041 24 25 #define N3000_SUBSYS_DEVICE_ID 0x001A ··· 38 37 #define IFCVF_ERR(pdev, fmt, ...) dev_err(&pdev->dev, fmt, ##__VA_ARGS__) 39 38 #define IFCVF_DBG(pdev, fmt, ...) dev_dbg(&pdev->dev, fmt, ##__VA_ARGS__) 40 39 #define IFCVF_INFO(pdev, fmt, ...) dev_info(&pdev->dev, fmt, ##__VA_ARGS__) 41 - 42 - #define ifcvf_private_to_vf(adapter) \ 43 - (&((struct ifcvf_adapter *)adapter)->vf) 44 40 45 41 /* all vqs and config interrupt has its own vector */ 46 42 #define MSIX_VECTOR_PER_VQ_AND_CONFIG 1 ··· 76 78 u32 dev_type; 77 79 u64 req_features; 78 80 u64 hw_features; 81 + /* provisioned device features */ 82 + u64 dev_features; 79 83 struct virtio_pci_common_cfg __iomem *common_cfg; 80 84 void __iomem *dev_cfg; 81 85 struct vring_info vring[IFCVF_MAX_QUEUES]; ··· 89 89 u16 nr_vring; 90 90 /* VIRTIO_PCI_CAP_DEVICE_CFG size */ 91 91 u32 cap_dev_config_size; 92 + struct pci_dev *pdev; 92 93 }; 93 94 94 95 struct ifcvf_adapter { 95 96 struct vdpa_device vdpa; 96 97 struct pci_dev *pdev; 97 - struct ifcvf_hw vf; 98 + struct ifcvf_hw *vf; 98 99 }; 99 100 100 101 struct ifcvf_vring_lm_cfg { ··· 110 109 111 110 struct ifcvf_vdpa_mgmt_dev { 112 111 struct vdpa_mgmt_dev mdev; 112 + struct ifcvf_hw vf; 113 113 struct ifcvf_adapter *adapter; 114 114 struct pci_dev *pdev; 115 115 };

+77 -85

drivers/vdpa/ifcvf/ifcvf_main.c

··· 69 69 pci_free_irq_vectors(data); 70 70 } 71 71 72 - static void ifcvf_free_per_vq_irq(struct ifcvf_adapter *adapter) 72 + static void ifcvf_free_per_vq_irq(struct ifcvf_hw *vf) 73 73 { 74 - struct pci_dev *pdev = adapter->pdev; 75 - struct ifcvf_hw *vf = &adapter->vf; 74 + struct pci_dev *pdev = vf->pdev; 76 75 int i; 77 76 78 77 for (i = 0; i < vf->nr_vring; i++) { ··· 82 83 } 83 84 } 84 85 85 - static void ifcvf_free_vqs_reused_irq(struct ifcvf_adapter *adapter) 86 + static void ifcvf_free_vqs_reused_irq(struct ifcvf_hw *vf) 86 87 { 87 - struct pci_dev *pdev = adapter->pdev; 88 - struct ifcvf_hw *vf = &adapter->vf; 88 + struct pci_dev *pdev = vf->pdev; 89 89 90 90 if (vf->vqs_reused_irq != -EINVAL) { 91 91 devm_free_irq(&pdev->dev, vf->vqs_reused_irq, vf); ··· 93 95 94 96 } 95 97 96 - static void ifcvf_free_vq_irq(struct ifcvf_adapter *adapter) 98 + static void ifcvf_free_vq_irq(struct ifcvf_hw *vf) 97 99 { 98 - struct ifcvf_hw *vf = &adapter->vf; 99 - 100 100 if (vf->msix_vector_status == MSIX_VECTOR_PER_VQ_AND_CONFIG) 101 - ifcvf_free_per_vq_irq(adapter); 101 + ifcvf_free_per_vq_irq(vf); 102 102 else 103 - ifcvf_free_vqs_reused_irq(adapter); 103 + ifcvf_free_vqs_reused_irq(vf); 104 104 } 105 105 106 - static void ifcvf_free_config_irq(struct ifcvf_adapter *adapter) 106 + static void ifcvf_free_config_irq(struct ifcvf_hw *vf) 107 107 { 108 - struct pci_dev *pdev = adapter->pdev; 109 - struct ifcvf_hw *vf = &adapter->vf; 108 + struct pci_dev *pdev = vf->pdev; 110 109 111 110 if (vf->config_irq == -EINVAL) 112 111 return; ··· 118 123 } 119 124 } 120 125 121 - static void ifcvf_free_irq(struct ifcvf_adapter *adapter) 126 + static void ifcvf_free_irq(struct ifcvf_hw *vf) 122 127 { 123 - struct pci_dev *pdev = adapter->pdev; 128 + struct pci_dev *pdev = vf->pdev; 124 129 125 - ifcvf_free_vq_irq(adapter); 126 - ifcvf_free_config_irq(adapter); 130 + ifcvf_free_vq_irq(vf); 131 + ifcvf_free_config_irq(vf); 127 132 ifcvf_free_irq_vectors(pdev); 128 133 } 129 134 ··· 132 137 * It returns the number of allocated vectors, negative 133 138 * return value when fails. 134 139 */ 135 - static int ifcvf_alloc_vectors(struct ifcvf_adapter *adapter) 140 + static int ifcvf_alloc_vectors(struct ifcvf_hw *vf) 136 141 { 137 - struct pci_dev *pdev = adapter->pdev; 138 - struct ifcvf_hw *vf = &adapter->vf; 142 + struct pci_dev *pdev = vf->pdev; 139 143 int max_intr, ret; 140 144 141 145 /* all queues and config interrupt */ ··· 154 160 return ret; 155 161 } 156 162 157 - static int ifcvf_request_per_vq_irq(struct ifcvf_adapter *adapter) 163 + static int ifcvf_request_per_vq_irq(struct ifcvf_hw *vf) 158 164 { 159 - struct pci_dev *pdev = adapter->pdev; 160 - struct ifcvf_hw *vf = &adapter->vf; 165 + struct pci_dev *pdev = vf->pdev; 161 166 int i, vector, ret, irq; 162 167 163 168 vf->vqs_reused_irq = -EINVAL; ··· 183 190 184 191 return 0; 185 192 err: 186 - ifcvf_free_irq(adapter); 193 + ifcvf_free_irq(vf); 187 194 188 195 return -EFAULT; 189 196 } 190 197 191 - static int ifcvf_request_vqs_reused_irq(struct ifcvf_adapter *adapter) 198 + static int ifcvf_request_vqs_reused_irq(struct ifcvf_hw *vf) 192 199 { 193 - struct pci_dev *pdev = adapter->pdev; 194 - struct ifcvf_hw *vf = &adapter->vf; 200 + struct pci_dev *pdev = vf->pdev; 195 201 int i, vector, ret, irq; 196 202 197 203 vector = 0; ··· 216 224 217 225 return 0; 218 226 err: 219 - ifcvf_free_irq(adapter); 227 + ifcvf_free_irq(vf); 220 228 221 229 return -EFAULT; 222 230 } 223 231 224 - static int ifcvf_request_dev_irq(struct ifcvf_adapter *adapter) 232 + static int ifcvf_request_dev_irq(struct ifcvf_hw *vf) 225 233 { 226 - struct pci_dev *pdev = adapter->pdev; 227 - struct ifcvf_hw *vf = &adapter->vf; 234 + struct pci_dev *pdev = vf->pdev; 228 235 int i, vector, ret, irq; 229 236 230 237 vector = 0; ··· 256 265 257 266 return 0; 258 267 err: 259 - ifcvf_free_irq(adapter); 268 + ifcvf_free_irq(vf); 260 269 261 270 return -EFAULT; 262 271 263 272 } 264 273 265 - static int ifcvf_request_vq_irq(struct ifcvf_adapter *adapter) 274 + static int ifcvf_request_vq_irq(struct ifcvf_hw *vf) 266 275 { 267 - struct ifcvf_hw *vf = &adapter->vf; 268 276 int ret; 269 277 270 278 if (vf->msix_vector_status == MSIX_VECTOR_PER_VQ_AND_CONFIG) 271 - ret = ifcvf_request_per_vq_irq(adapter); 279 + ret = ifcvf_request_per_vq_irq(vf); 272 280 else 273 - ret = ifcvf_request_vqs_reused_irq(adapter); 281 + ret = ifcvf_request_vqs_reused_irq(vf); 274 282 275 283 return ret; 276 284 } 277 285 278 - static int ifcvf_request_config_irq(struct ifcvf_adapter *adapter) 286 + static int ifcvf_request_config_irq(struct ifcvf_hw *vf) 279 287 { 280 - struct pci_dev *pdev = adapter->pdev; 281 - struct ifcvf_hw *vf = &adapter->vf; 288 + struct pci_dev *pdev = vf->pdev; 282 289 int config_vector, ret; 283 290 284 291 if (vf->msix_vector_status == MSIX_VECTOR_PER_VQ_AND_CONFIG) ··· 309 320 310 321 return 0; 311 322 err: 312 - ifcvf_free_irq(adapter); 323 + ifcvf_free_irq(vf); 313 324 314 325 return -EFAULT; 315 326 } 316 327 317 - static int ifcvf_request_irq(struct ifcvf_adapter *adapter) 328 + static int ifcvf_request_irq(struct ifcvf_hw *vf) 318 329 { 319 - struct ifcvf_hw *vf = &adapter->vf; 320 330 int nvectors, ret, max_intr; 321 331 322 - nvectors = ifcvf_alloc_vectors(adapter); 332 + nvectors = ifcvf_alloc_vectors(vf); 323 333 if (nvectors <= 0) 324 334 return -EFAULT; 325 335 ··· 329 341 330 342 if (nvectors == 1) { 331 343 vf->msix_vector_status = MSIX_VECTOR_DEV_SHARED; 332 - ret = ifcvf_request_dev_irq(adapter); 344 + ret = ifcvf_request_dev_irq(vf); 333 345 334 346 return ret; 335 347 } 336 348 337 - ret = ifcvf_request_vq_irq(adapter); 349 + ret = ifcvf_request_vq_irq(vf); 338 350 if (ret) 339 351 return ret; 340 352 341 - ret = ifcvf_request_config_irq(adapter); 353 + ret = ifcvf_request_config_irq(vf); 342 354 343 355 if (ret) 344 356 return ret; ··· 346 358 return 0; 347 359 } 348 360 349 - static int ifcvf_start_datapath(void *private) 361 + static int ifcvf_start_datapath(struct ifcvf_adapter *adapter) 350 362 { 351 - struct ifcvf_hw *vf = ifcvf_private_to_vf(private); 363 + struct ifcvf_hw *vf = adapter->vf; 352 364 u8 status; 353 365 int ret; 354 366 ··· 362 374 return ret; 363 375 } 364 376 365 - static int ifcvf_stop_datapath(void *private) 377 + static int ifcvf_stop_datapath(struct ifcvf_adapter *adapter) 366 378 { 367 - struct ifcvf_hw *vf = ifcvf_private_to_vf(private); 379 + struct ifcvf_hw *vf = adapter->vf; 368 380 int i; 369 381 370 382 for (i = 0; i < vf->nr_vring; i++) ··· 377 389 378 390 static void ifcvf_reset_vring(struct ifcvf_adapter *adapter) 379 391 { 380 - struct ifcvf_hw *vf = ifcvf_private_to_vf(adapter); 392 + struct ifcvf_hw *vf = adapter->vf; 381 393 int i; 382 394 383 395 for (i = 0; i < vf->nr_vring; i++) { ··· 402 414 { 403 415 struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev); 404 416 405 - return &adapter->vf; 417 + return adapter->vf; 406 418 } 407 419 408 420 static u64 ifcvf_vdpa_get_device_features(struct vdpa_device *vdpa_dev) ··· 467 479 468 480 if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && 469 481 !(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) { 470 - ret = ifcvf_request_irq(adapter); 482 + ret = ifcvf_request_irq(vf); 471 483 if (ret) { 472 484 status = ifcvf_get_status(vf); 473 485 status |= VIRTIO_CONFIG_S_FAILED; ··· 499 511 500 512 if (status_old & VIRTIO_CONFIG_S_DRIVER_OK) { 501 513 ifcvf_stop_datapath(adapter); 502 - ifcvf_free_irq(adapter); 514 + ifcvf_free_irq(vf); 503 515 } 504 516 505 517 ifcvf_reset_vring(adapter); ··· 743 755 struct vdpa_device *vdpa_dev; 744 756 struct pci_dev *pdev; 745 757 struct ifcvf_hw *vf; 758 + u64 device_features; 746 759 int ret; 747 760 748 761 ifcvf_mgmt_dev = container_of(mdev, struct ifcvf_vdpa_mgmt_dev, mdev); 749 - if (!ifcvf_mgmt_dev->adapter) 750 - return -EOPNOTSUPP; 762 + vf = &ifcvf_mgmt_dev->vf; 763 + pdev = vf->pdev; 764 + adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa, 765 + &pdev->dev, &ifc_vdpa_ops, 1, 1, NULL, false); 766 + if (IS_ERR(adapter)) { 767 + IFCVF_ERR(pdev, "Failed to allocate vDPA structure"); 768 + return PTR_ERR(adapter); 769 + } 751 770 752 - adapter = ifcvf_mgmt_dev->adapter; 753 - vf = &adapter->vf; 754 - pdev = adapter->pdev; 771 + ifcvf_mgmt_dev->adapter = adapter; 772 + adapter->pdev = pdev; 773 + adapter->vdpa.dma_dev = &pdev->dev; 774 + adapter->vdpa.mdev = mdev; 775 + adapter->vf = vf; 755 776 vdpa_dev = &adapter->vdpa; 777 + 778 + device_features = vf->hw_features; 779 + if (config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) { 780 + if (config->device_features & ~device_features) { 781 + IFCVF_ERR(pdev, "The provisioned features 0x%llx are not supported by this device with features 0x%llx\n", 782 + config->device_features, device_features); 783 + return -EINVAL; 784 + } 785 + device_features &= config->device_features; 786 + } 787 + vf->dev_features = device_features; 756 788 757 789 if (name) 758 790 ret = dev_set_name(&vdpa_dev->dev, "%s", name); ··· 788 780 789 781 return 0; 790 782 } 791 - 792 783 793 784 static void ifcvf_vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev) 794 785 { ··· 807 800 { 808 801 struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev; 809 802 struct device *dev = &pdev->dev; 810 - struct ifcvf_adapter *adapter; 811 803 struct ifcvf_hw *vf; 812 804 u32 dev_type; 813 805 int ret, i; ··· 837 831 } 838 832 839 833 pci_set_master(pdev); 840 - 841 - adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa, 842 - dev, &ifc_vdpa_ops, 1, 1, NULL, false); 843 - if (IS_ERR(adapter)) { 844 - IFCVF_ERR(pdev, "Failed to allocate vDPA structure"); 845 - return PTR_ERR(adapter); 834 + ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), GFP_KERNEL); 835 + if (!ifcvf_mgmt_dev) { 836 + IFCVF_ERR(pdev, "Failed to alloc memory for the vDPA management device\n"); 837 + return -ENOMEM; 846 838 } 847 839 848 - vf = &adapter->vf; 840 + vf = &ifcvf_mgmt_dev->vf; 849 841 vf->dev_type = get_dev_type(pdev); 850 842 vf->base = pcim_iomap_table(pdev); 851 - 852 - adapter->pdev = pdev; 853 - adapter->vdpa.dma_dev = &pdev->dev; 843 + vf->pdev = pdev; 854 844 855 845 ret = ifcvf_init_hw(vf, pdev); 856 846 if (ret) { ··· 859 857 860 858 vf->hw_features = ifcvf_get_hw_features(vf); 861 859 vf->config_size = ifcvf_get_config_size(vf); 862 - 863 - ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), GFP_KERNEL); 864 - if (!ifcvf_mgmt_dev) { 865 - IFCVF_ERR(pdev, "Failed to alloc memory for the vDPA management device\n"); 866 - return -ENOMEM; 867 - } 868 - 869 - ifcvf_mgmt_dev->mdev.ops = &ifcvf_vdpa_mgmt_dev_ops; 870 - ifcvf_mgmt_dev->mdev.device = dev; 871 - ifcvf_mgmt_dev->adapter = adapter; 872 860 873 861 dev_type = get_dev_type(pdev); 874 862 switch (dev_type) { ··· 874 882 goto err; 875 883 } 876 884 885 + ifcvf_mgmt_dev->mdev.ops = &ifcvf_vdpa_mgmt_dev_ops; 886 + ifcvf_mgmt_dev->mdev.device = dev; 877 887 ifcvf_mgmt_dev->mdev.max_supported_vqs = vf->nr_vring; 878 888 ifcvf_mgmt_dev->mdev.supported_features = vf->hw_features; 879 - 880 - adapter->vdpa.mdev = &ifcvf_mgmt_dev->mdev; 881 - 889 + ifcvf_mgmt_dev->mdev.config_attr_mask = (1 << VDPA_ATTR_DEV_FEATURES); 882 890 883 891 ret = vdpa_mgmtdev_register(&ifcvf_mgmt_dev->mdev); 884 892 if (ret) {

+1 -1

drivers/vdpa/mlx5/Makefile

··· 1 1 subdir-ccflags-y += -I$(srctree)/drivers/vdpa/mlx5/core 2 2 3 3 obj-$(CONFIG_MLX5_VDPA_NET) += mlx5_vdpa.o 4 - mlx5_vdpa-$(CONFIG_MLX5_VDPA_NET) += net/mlx5_vnet.o core/resources.o core/mr.o 4 + mlx5_vdpa-$(CONFIG_MLX5_VDPA_NET) += net/mlx5_vnet.o core/resources.o core/mr.o net/debug.o

-1

drivers/vdpa/mlx5/core/mr.c

··· 503 503 else 504 504 destroy_dma_mr(mvdev, mr); 505 505 506 - memset(mr, 0, sizeof(*mr)); 507 506 mr->initialized = false; 508 507 out: 509 508 mutex_unlock(&mr->mkey_mtx);

+2 -1

drivers/vdpa/mlx5/core/resources.c

··· 213 213 return err; 214 214 215 215 mkey_index = MLX5_GET(create_mkey_out, lout, mkey_index); 216 - *mkey |= mlx5_idx_to_mkey(mkey_index); 216 + *mkey = mlx5_idx_to_mkey(mkey_index); 217 217 return 0; 218 218 } 219 219 ··· 233 233 if (!mvdev->cvq.iotlb) 234 234 return -ENOMEM; 235 235 236 + spin_lock_init(&mvdev->cvq.iommu_lock); 236 237 vringh_set_iotlb(&mvdev->cvq.vring, mvdev->cvq.iotlb, &mvdev->cvq.iommu_lock); 237 238 238 239 return 0;

+152

drivers/vdpa/mlx5/net/debug.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + /* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */ 3 + 4 + #include <linux/debugfs.h> 5 + #include <linux/mlx5/fs.h> 6 + #include "mlx5_vnet.h" 7 + 8 + static int tirn_show(struct seq_file *file, void *priv) 9 + { 10 + struct mlx5_vdpa_net *ndev = file->private; 11 + 12 + seq_printf(file, "0x%x\n", ndev->res.tirn); 13 + return 0; 14 + } 15 + 16 + DEFINE_SHOW_ATTRIBUTE(tirn); 17 + 18 + void mlx5_vdpa_remove_tirn(struct mlx5_vdpa_net *ndev) 19 + { 20 + if (ndev->debugfs) 21 + debugfs_remove(ndev->res.tirn_dent); 22 + } 23 + 24 + void mlx5_vdpa_add_tirn(struct mlx5_vdpa_net *ndev) 25 + { 26 + ndev->res.tirn_dent = debugfs_create_file("tirn", 0444, ndev->rx_dent, 27 + ndev, &tirn_fops); 28 + } 29 + 30 + static int rx_flow_table_show(struct seq_file *file, void *priv) 31 + { 32 + struct mlx5_vdpa_net *ndev = file->private; 33 + 34 + seq_printf(file, "0x%x\n", mlx5_flow_table_id(ndev->rxft)); 35 + return 0; 36 + } 37 + 38 + DEFINE_SHOW_ATTRIBUTE(rx_flow_table); 39 + 40 + void mlx5_vdpa_remove_rx_flow_table(struct mlx5_vdpa_net *ndev) 41 + { 42 + if (ndev->debugfs) 43 + debugfs_remove(ndev->rx_table_dent); 44 + } 45 + 46 + void mlx5_vdpa_add_rx_flow_table(struct mlx5_vdpa_net *ndev) 47 + { 48 + ndev->rx_table_dent = debugfs_create_file("table_id", 0444, ndev->rx_dent, 49 + ndev, &rx_flow_table_fops); 50 + } 51 + 52 + #if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) 53 + static int packets_show(struct seq_file *file, void *priv) 54 + { 55 + struct mlx5_vdpa_counter *counter = file->private; 56 + u64 packets; 57 + u64 bytes; 58 + int err; 59 + 60 + err = mlx5_fc_query(counter->mdev, counter->counter, &packets, &bytes); 61 + if (err) 62 + return err; 63 + 64 + seq_printf(file, "0x%llx\n", packets); 65 + return 0; 66 + } 67 + 68 + static int bytes_show(struct seq_file *file, void *priv) 69 + { 70 + struct mlx5_vdpa_counter *counter = file->private; 71 + u64 packets; 72 + u64 bytes; 73 + int err; 74 + 75 + err = mlx5_fc_query(counter->mdev, counter->counter, &packets, &bytes); 76 + if (err) 77 + return err; 78 + 79 + seq_printf(file, "0x%llx\n", bytes); 80 + return 0; 81 + } 82 + 83 + DEFINE_SHOW_ATTRIBUTE(packets); 84 + DEFINE_SHOW_ATTRIBUTE(bytes); 85 + 86 + static void add_counter_node(struct mlx5_vdpa_counter *counter, 87 + struct dentry *parent) 88 + { 89 + debugfs_create_file("packets", 0444, parent, counter, 90 + &packets_fops); 91 + debugfs_create_file("bytes", 0444, parent, counter, 92 + &bytes_fops); 93 + } 94 + 95 + void mlx5_vdpa_add_rx_counters(struct mlx5_vdpa_net *ndev, 96 + struct macvlan_node *node) 97 + { 98 + static const char *ut = "untagged"; 99 + char vidstr[9]; 100 + u16 vid; 101 + 102 + node->ucast_counter.mdev = ndev->mvdev.mdev; 103 + node->mcast_counter.mdev = ndev->mvdev.mdev; 104 + if (node->tagged) { 105 + vid = key2vid(node->macvlan); 106 + snprintf(vidstr, sizeof(vidstr), "0x%x", vid); 107 + } else { 108 + strcpy(vidstr, ut); 109 + } 110 + 111 + node->dent = debugfs_create_dir(vidstr, ndev->rx_dent); 112 + if (IS_ERR(node->dent)) { 113 + node->dent = NULL; 114 + return; 115 + } 116 + 117 + node->ucast_counter.dent = debugfs_create_dir("ucast", node->dent); 118 + if (IS_ERR(node->ucast_counter.dent)) 119 + return; 120 + 121 + add_counter_node(&node->ucast_counter, node->ucast_counter.dent); 122 + 123 + node->mcast_counter.dent = debugfs_create_dir("mcast", node->dent); 124 + if (IS_ERR(node->mcast_counter.dent)) 125 + return; 126 + 127 + add_counter_node(&node->mcast_counter, node->mcast_counter.dent); 128 + } 129 + 130 + void mlx5_vdpa_remove_rx_counters(struct mlx5_vdpa_net *ndev, 131 + struct macvlan_node *node) 132 + { 133 + if (node->dent && ndev->debugfs) 134 + debugfs_remove_recursive(node->dent); 135 + } 136 + #endif 137 + 138 + void mlx5_vdpa_add_debugfs(struct mlx5_vdpa_net *ndev) 139 + { 140 + struct mlx5_core_dev *mdev; 141 + 142 + mdev = ndev->mvdev.mdev; 143 + ndev->debugfs = debugfs_create_dir(dev_name(&ndev->mvdev.vdev.dev), 144 + mlx5_debugfs_get_dev_root(mdev)); 145 + if (!IS_ERR(ndev->debugfs)) 146 + ndev->rx_dent = debugfs_create_dir("rx", ndev->debugfs); 147 + } 148 + 149 + void mlx5_vdpa_remove_debugfs(struct dentry *dbg) 150 + { 151 + debugfs_remove_recursive(dbg); 152 + }

+172 -89

drivers/vdpa/mlx5/net/mlx5_vnet.c

··· 18 18 #include <linux/mlx5/mlx5_ifc_vdpa.h> 19 19 #include <linux/mlx5/mpfs.h> 20 20 #include "mlx5_vdpa.h" 21 + #include "mlx5_vnet.h" 21 22 22 23 MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>"); 23 24 MODULE_DESCRIPTION("Mellanox VDPA driver"); 24 25 MODULE_LICENSE("Dual BSD/GPL"); 25 - 26 - #define to_mlx5_vdpa_ndev(__mvdev) \ 27 - container_of(__mvdev, struct mlx5_vdpa_net, mvdev) 28 - #define to_mvdev(__vdev) container_of((__vdev), struct mlx5_vdpa_dev, vdev) 29 26 30 27 #define VALID_FEATURES_MASK \ 31 28 (BIT_ULL(VIRTIO_NET_F_CSUM) | BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) | \ ··· 46 49 #define MLX5_FEATURE(_mvdev, _feature) (!!((_mvdev)->actual_features & BIT_ULL(_feature))) 47 50 48 51 #define MLX5V_UNTAGGED 0x1000 49 - 50 - struct mlx5_vdpa_net_resources { 51 - u32 tisn; 52 - u32 tdn; 53 - u32 tirn; 54 - u32 rqtn; 55 - bool valid; 56 - }; 57 52 58 53 struct mlx5_vdpa_cq_buf { 59 54 struct mlx5_frag_buf_ctrl fbc; ··· 134 145 135 146 return idx <= mvdev->max_idx; 136 147 } 137 - 138 - #define MLX5V_MACVLAN_SIZE 256 139 - 140 - struct mlx5_vdpa_net { 141 - struct mlx5_vdpa_dev mvdev; 142 - struct mlx5_vdpa_net_resources res; 143 - struct virtio_net_config config; 144 - struct mlx5_vdpa_virtqueue *vqs; 145 - struct vdpa_callback *event_cbs; 146 - 147 - /* Serialize vq resources creation and destruction. This is required 148 - * since memory map might change and we need to destroy and create 149 - * resources while driver in operational. 150 - */ 151 - struct rw_semaphore reslock; 152 - struct mlx5_flow_table *rxft; 153 - bool setup; 154 - u32 cur_num_vqs; 155 - u32 rqt_size; 156 - bool nb_registered; 157 - struct notifier_block nb; 158 - struct vdpa_callback config_cb; 159 - struct mlx5_vdpa_wq_ent cvq_ent; 160 - struct hlist_head macvlan_hash[MLX5V_MACVLAN_SIZE]; 161 - }; 162 - 163 - struct macvlan_node { 164 - struct hlist_node hlist; 165 - struct mlx5_flow_handle *ucast_rule; 166 - struct mlx5_flow_handle *mcast_rule; 167 - u64 macvlan; 168 - }; 169 148 170 149 static void free_resources(struct mlx5_vdpa_net *ndev); 171 150 static void init_mvqs(struct mlx5_vdpa_net *ndev); ··· 1388 1431 1389 1432 err = mlx5_vdpa_create_tir(&ndev->mvdev, in, &ndev->res.tirn); 1390 1433 kfree(in); 1434 + if (err) 1435 + return err; 1436 + 1437 + mlx5_vdpa_add_tirn(ndev); 1391 1438 return err; 1392 1439 } 1393 1440 1394 1441 static void destroy_tir(struct mlx5_vdpa_net *ndev) 1395 1442 { 1443 + mlx5_vdpa_remove_tirn(ndev); 1396 1444 mlx5_vdpa_destroy_tir(&ndev->mvdev, ndev->res.tirn); 1397 1445 } 1398 1446 1399 1447 #define MAX_STEERING_ENT 0x8000 1400 1448 #define MAX_STEERING_GROUPS 2 1401 1449 1402 - static int mlx5_vdpa_add_mac_vlan_rules(struct mlx5_vdpa_net *ndev, u8 *mac, 1403 - u16 vid, bool tagged, 1404 - struct mlx5_flow_handle **ucast, 1405 - struct mlx5_flow_handle **mcast) 1450 + #if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) 1451 + #define NUM_DESTS 2 1452 + #else 1453 + #define NUM_DESTS 1 1454 + #endif 1455 + 1456 + static int add_steering_counters(struct mlx5_vdpa_net *ndev, 1457 + struct macvlan_node *node, 1458 + struct mlx5_flow_act *flow_act, 1459 + struct mlx5_flow_destination *dests) 1406 1460 { 1407 - struct mlx5_flow_destination dest = {}; 1461 + #if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) 1462 + int err; 1463 + 1464 + node->ucast_counter.counter = mlx5_fc_create(ndev->mvdev.mdev, false); 1465 + if (IS_ERR(node->ucast_counter.counter)) 1466 + return PTR_ERR(node->ucast_counter.counter); 1467 + 1468 + node->mcast_counter.counter = mlx5_fc_create(ndev->mvdev.mdev, false); 1469 + if (IS_ERR(node->mcast_counter.counter)) { 1470 + err = PTR_ERR(node->mcast_counter.counter); 1471 + goto err_mcast_counter; 1472 + } 1473 + 1474 + dests[1].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER; 1475 + flow_act->action |= MLX5_FLOW_CONTEXT_ACTION_COUNT; 1476 + return 0; 1477 + 1478 + err_mcast_counter: 1479 + mlx5_fc_destroy(ndev->mvdev.mdev, node->ucast_counter.counter); 1480 + return err; 1481 + #else 1482 + return 0; 1483 + #endif 1484 + } 1485 + 1486 + static void remove_steering_counters(struct mlx5_vdpa_net *ndev, 1487 + struct macvlan_node *node) 1488 + { 1489 + #if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) 1490 + mlx5_fc_destroy(ndev->mvdev.mdev, node->mcast_counter.counter); 1491 + mlx5_fc_destroy(ndev->mvdev.mdev, node->ucast_counter.counter); 1492 + #endif 1493 + } 1494 + 1495 + static int mlx5_vdpa_add_mac_vlan_rules(struct mlx5_vdpa_net *ndev, u8 *mac, 1496 + struct macvlan_node *node) 1497 + { 1498 + struct mlx5_flow_destination dests[NUM_DESTS] = {}; 1408 1499 struct mlx5_flow_act flow_act = {}; 1409 - struct mlx5_flow_handle *rule; 1410 1500 struct mlx5_flow_spec *spec; 1411 1501 void *headers_c; 1412 1502 void *headers_v; 1413 1503 u8 *dmac_c; 1414 1504 u8 *dmac_v; 1415 1505 int err; 1506 + u16 vid; 1416 1507 1417 1508 spec = kvzalloc(sizeof(*spec), GFP_KERNEL); 1418 1509 if (!spec) 1419 1510 return -ENOMEM; 1420 1511 1512 + vid = key2vid(node->macvlan); 1421 1513 spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; 1422 1514 headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, outer_headers); 1423 1515 headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, outer_headers); ··· 1478 1472 MLX5_SET(fte_match_set_lyr_2_4, headers_c, cvlan_tag, 1); 1479 1473 MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c, first_vid); 1480 1474 } 1481 - if (tagged) { 1475 + if (node->tagged) { 1482 1476 MLX5_SET(fte_match_set_lyr_2_4, headers_v, cvlan_tag, 1); 1483 1477 MLX5_SET(fte_match_set_lyr_2_4, headers_v, first_vid, vid); 1484 1478 } 1485 1479 flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; 1486 - dest.type = MLX5_FLOW_DESTINATION_TYPE_TIR; 1487 - dest.tir_num = ndev->res.tirn; 1488 - rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, &dest, 1); 1489 - if (IS_ERR(rule)) 1490 - return PTR_ERR(rule); 1480 + dests[0].type = MLX5_FLOW_DESTINATION_TYPE_TIR; 1481 + dests[0].tir_num = ndev->res.tirn; 1482 + err = add_steering_counters(ndev, node, &flow_act, dests); 1483 + if (err) 1484 + goto out_free; 1491 1485 1492 - *ucast = rule; 1486 + #if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) 1487 + dests[1].counter_id = mlx5_fc_id(node->ucast_counter.counter); 1488 + #endif 1489 + node->ucast_rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, dests, NUM_DESTS); 1490 + if (IS_ERR(node->ucast_rule)) { 1491 + err = PTR_ERR(node->ucast_rule); 1492 + goto err_ucast; 1493 + } 1494 + 1495 + #if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) 1496 + dests[1].counter_id = mlx5_fc_id(node->mcast_counter.counter); 1497 + #endif 1493 1498 1494 1499 memset(dmac_c, 0, ETH_ALEN); 1495 1500 memset(dmac_v, 0, ETH_ALEN); 1496 1501 dmac_c[0] = 1; 1497 1502 dmac_v[0] = 1; 1498 - rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, &dest, 1); 1499 - kvfree(spec); 1500 - if (IS_ERR(rule)) { 1501 - err = PTR_ERR(rule); 1503 + node->mcast_rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, dests, NUM_DESTS); 1504 + if (IS_ERR(node->mcast_rule)) { 1505 + err = PTR_ERR(node->mcast_rule); 1502 1506 goto err_mcast; 1503 1507 } 1504 - 1505 - *mcast = rule; 1508 + kvfree(spec); 1509 + mlx5_vdpa_add_rx_counters(ndev, node); 1506 1510 return 0; 1507 1511 1508 1512 err_mcast: 1509 - mlx5_del_flow_rules(*ucast); 1513 + mlx5_del_flow_rules(node->ucast_rule); 1514 + err_ucast: 1515 + remove_steering_counters(ndev, node); 1516 + out_free: 1517 + kvfree(spec); 1510 1518 return err; 1511 1519 } 1512 1520 1513 1521 static void mlx5_vdpa_del_mac_vlan_rules(struct mlx5_vdpa_net *ndev, 1514 - struct mlx5_flow_handle *ucast, 1515 - struct mlx5_flow_handle *mcast) 1522 + struct macvlan_node *node) 1516 1523 { 1517 - mlx5_del_flow_rules(ucast); 1518 - mlx5_del_flow_rules(mcast); 1524 + mlx5_vdpa_remove_rx_counters(ndev, node); 1525 + mlx5_del_flow_rules(node->ucast_rule); 1526 + mlx5_del_flow_rules(node->mcast_rule); 1519 1527 } 1520 1528 1521 1529 static u64 search_val(u8 *mac, u16 vlan, bool tagged) ··· 1563 1543 return NULL; 1564 1544 } 1565 1545 1566 - static int mac_vlan_add(struct mlx5_vdpa_net *ndev, u8 *mac, u16 vlan, bool tagged) // vlan -> vid 1546 + static int mac_vlan_add(struct mlx5_vdpa_net *ndev, u8 *mac, u16 vid, bool tagged) 1567 1547 { 1568 1548 struct macvlan_node *ptr; 1569 1549 u64 val; 1570 1550 u32 idx; 1571 1551 int err; 1572 1552 1573 - val = search_val(mac, vlan, tagged); 1553 + val = search_val(mac, vid, tagged); 1574 1554 if (mac_vlan_lookup(ndev, val)) 1575 1555 return -EEXIST; 1576 1556 ··· 1578 1558 if (!ptr) 1579 1559 return -ENOMEM; 1580 1560 1581 - err = mlx5_vdpa_add_mac_vlan_rules(ndev, ndev->config.mac, vlan, tagged, 1582 - &ptr->ucast_rule, &ptr->mcast_rule); 1561 + ptr->tagged = tagged; 1562 + ptr->macvlan = val; 1563 + ptr->ndev = ndev; 1564 + err = mlx5_vdpa_add_mac_vlan_rules(ndev, ndev->config.mac, ptr); 1583 1565 if (err) 1584 1566 goto err_add; 1585 1567 1586 - ptr->macvlan = val; 1587 1568 idx = hash_64(val, 8); 1588 1569 hlist_add_head(&ptr->hlist, &ndev->macvlan_hash[idx]); 1589 1570 return 0; ··· 1603 1582 return; 1604 1583 1605 1584 hlist_del(&ptr->hlist); 1606 - mlx5_vdpa_del_mac_vlan_rules(ndev, ptr->ucast_rule, ptr->mcast_rule); 1585 + mlx5_vdpa_del_mac_vlan_rules(ndev, ptr); 1586 + remove_steering_counters(ndev, ptr); 1607 1587 kfree(ptr); 1608 1588 } 1609 1589 ··· 1617 1595 for (i = 0; i < MLX5V_MACVLAN_SIZE; i++) { 1618 1596 hlist_for_each_entry_safe(pos, n, &ndev->macvlan_hash[i], hlist) { 1619 1597 hlist_del(&pos->hlist); 1620 - mlx5_vdpa_del_mac_vlan_rules(ndev, pos->ucast_rule, pos->mcast_rule); 1598 + mlx5_vdpa_del_mac_vlan_rules(ndev, pos); 1599 + remove_steering_counters(ndev, pos); 1621 1600 kfree(pos); 1622 1601 } 1623 1602 } ··· 1644 1621 mlx5_vdpa_warn(&ndev->mvdev, "failed to create flow table\n"); 1645 1622 return PTR_ERR(ndev->rxft); 1646 1623 } 1624 + mlx5_vdpa_add_rx_flow_table(ndev); 1647 1625 1648 1626 err = mac_vlan_add(ndev, ndev->config.mac, 0, false); 1649 1627 if (err) ··· 1653 1629 return 0; 1654 1630 1655 1631 err_add: 1632 + mlx5_vdpa_remove_rx_flow_table(ndev); 1656 1633 mlx5_destroy_flow_table(ndev->rxft); 1657 1634 return err; 1658 1635 } ··· 1661 1636 static void teardown_steering(struct mlx5_vdpa_net *ndev) 1662 1637 { 1663 1638 clear_mac_vlan_table(ndev); 1639 + mlx5_vdpa_remove_rx_flow_table(ndev); 1664 1640 mlx5_destroy_flow_table(ndev->rxft); 1665 1641 } 1666 1642 ··· 2209 2183 mlx_vdpa_features |= BIT_ULL(VIRTIO_NET_F_STATUS); 2210 2184 mlx_vdpa_features |= BIT_ULL(VIRTIO_NET_F_MTU); 2211 2185 mlx_vdpa_features |= BIT_ULL(VIRTIO_NET_F_CTRL_VLAN); 2186 + mlx_vdpa_features |= BIT_ULL(VIRTIO_NET_F_MAC); 2212 2187 2213 2188 return mlx_vdpa_features; 2214 2189 } ··· 2682 2655 return err; 2683 2656 } 2684 2657 2658 + static struct device *mlx5_get_vq_dma_dev(struct vdpa_device *vdev, u16 idx) 2659 + { 2660 + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); 2661 + 2662 + if (is_ctrl_vq_idx(mvdev, idx)) 2663 + return &vdev->dev; 2664 + 2665 + return mvdev->vdev.dma_dev; 2666 + } 2667 + 2685 2668 static void mlx5_vdpa_free(struct vdpa_device *vdev) 2686 2669 { 2687 2670 struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); ··· 2907 2870 .get_generation = mlx5_vdpa_get_generation, 2908 2871 .set_map = mlx5_vdpa_set_map, 2909 2872 .set_group_asid = mlx5_set_group_asid, 2873 + .get_vq_dma_dev = mlx5_get_vq_dma_dev, 2910 2874 .free = mlx5_vdpa_free, 2911 2875 .suspend = mlx5_vdpa_suspend, 2912 2876 }; ··· 3047 3009 struct mlx5_vdpa_wq_ent *wqent; 3048 3010 3049 3011 if (event == MLX5_EVENT_TYPE_PORT_CHANGE) { 3012 + if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_STATUS))) 3013 + return NOTIFY_DONE; 3050 3014 switch (eqe->sub_type) { 3051 3015 case MLX5_PORT_CHANGE_SUBTYPE_DOWN: 3052 3016 case MLX5_PORT_CHANGE_SUBTYPE_ACTIVE: ··· 3100 3060 struct mlx5_vdpa_dev *mvdev; 3101 3061 struct mlx5_vdpa_net *ndev; 3102 3062 struct mlx5_core_dev *mdev; 3063 + u64 device_features; 3103 3064 u32 max_vqs; 3104 3065 u16 mtu; 3105 3066 int err; ··· 3109 3068 return -ENOSPC; 3110 3069 3111 3070 mdev = mgtdev->madev->mdev; 3071 + device_features = mgtdev->mgtdev.supported_features; 3072 + if (add_config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) { 3073 + if (add_config->device_features & ~device_features) { 3074 + dev_warn(mdev->device, 3075 + "The provisioned features 0x%llx are not supported by this device with features 0x%llx\n", 3076 + add_config->device_features, device_features); 3077 + return -EINVAL; 3078 + } 3079 + device_features &= add_config->device_features; 3080 + } 3081 + if (!(device_features & BIT_ULL(VIRTIO_F_VERSION_1) && 3082 + device_features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) { 3083 + dev_warn(mdev->device, 3084 + "Must provision minimum features 0x%llx for this device", 3085 + BIT_ULL(VIRTIO_F_VERSION_1) | BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)); 3086 + return -EOPNOTSUPP; 3087 + } 3088 + 3112 3089 if (!(MLX5_CAP_DEV_VDPA_EMULATION(mdev, virtio_queue_type) & 3113 3090 MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_SPLIT)) { 3114 3091 dev_warn(mdev->device, "missing support for split virtqueues\n"); ··· 3155 3096 if (IS_ERR(ndev)) 3156 3097 return PTR_ERR(ndev); 3157 3098 3158 - ndev->mvdev.mlx_features = mgtdev->mgtdev.supported_features; 3159 3099 ndev->mvdev.max_vqs = max_vqs; 3160 3100 mvdev = &ndev->mvdev; 3161 3101 mvdev->mdev = mdev; ··· 3176 3118 goto err_alloc; 3177 3119 } 3178 3120 3179 - err = query_mtu(mdev, &mtu); 3180 - if (err) 3181 - goto err_alloc; 3121 + if (device_features & BIT_ULL(VIRTIO_NET_F_MTU)) { 3122 + err = query_mtu(mdev, &mtu); 3123 + if (err) 3124 + goto err_alloc; 3182 3125 3183 - ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, mtu); 3126 + ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, mtu); 3127 + } 3184 3128 3185 - if (get_link_state(mvdev)) 3186 - ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP); 3187 - else 3188 - ndev->config.status &= cpu_to_mlx5vdpa16(mvdev, ~VIRTIO_NET_S_LINK_UP); 3129 + if (device_features & BIT_ULL(VIRTIO_NET_F_STATUS)) { 3130 + if (get_link_state(mvdev)) 3131 + ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP); 3132 + else 3133 + ndev->config.status &= cpu_to_mlx5vdpa16(mvdev, ~VIRTIO_NET_S_LINK_UP); 3134 + } 3189 3135 3190 3136 if (add_config->mask & (1 << VDPA_ATTR_DEV_NET_CFG_MACADDR)) { 3191 3137 memcpy(ndev->config.mac, add_config->net.mac, ETH_ALEN); 3192 - } else { 3138 + /* No bother setting mac address in config if not going to provision _F_MAC */ 3139 + } else if ((add_config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) == 0 || 3140 + device_features & BIT_ULL(VIRTIO_NET_F_MAC)) { 3193 3141 err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac); 3194 3142 if (err) 3195 3143 goto err_alloc; ··· 3206 3142 err = mlx5_mpfs_add_mac(pfmdev, config->mac); 3207 3143 if (err) 3208 3144 goto err_alloc; 3209 - 3210 - ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_NET_F_MAC); 3145 + } else if ((add_config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) == 0) { 3146 + /* 3147 + * We used to clear _F_MAC feature bit if seeing 3148 + * zero mac address when device features are not 3149 + * specifically provisioned. Keep the behaviour 3150 + * so old scripts do not break. 3151 + */ 3152 + device_features &= ~BIT_ULL(VIRTIO_NET_F_MAC); 3153 + } else if (device_features & BIT_ULL(VIRTIO_NET_F_MAC)) { 3154 + /* Don't provision zero mac address for _F_MAC */ 3155 + mlx5_vdpa_warn(&ndev->mvdev, 3156 + "No mac address provisioned?\n"); 3157 + err = -EINVAL; 3158 + goto err_alloc; 3211 3159 } 3212 3160 3213 - config->max_virtqueue_pairs = cpu_to_mlx5vdpa16(mvdev, max_vqs / 2); 3161 + if (device_features & BIT_ULL(VIRTIO_NET_F_MQ)) 3162 + config->max_virtqueue_pairs = cpu_to_mlx5vdpa16(mvdev, max_vqs / 2); 3163 + 3164 + ndev->mvdev.mlx_features = device_features; 3214 3165 mvdev->vdev.dma_dev = &mdev->pdev->dev; 3215 3166 err = mlx5_vdpa_alloc_resources(&ndev->mvdev); 3216 3167 if (err) ··· 3257 3178 if (err) 3258 3179 goto err_reg; 3259 3180 3181 + mlx5_vdpa_add_debugfs(ndev); 3260 3182 mgtdev->ndev = ndev; 3261 3183 return 0; 3262 3184 ··· 3284 3204 struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 3285 3205 struct workqueue_struct *wq; 3286 3206 3207 + mlx5_vdpa_remove_debugfs(ndev->debugfs); 3208 + ndev->debugfs = NULL; 3287 3209 if (ndev->nb_registered) { 3288 3210 ndev->nb_registered = false; 3289 3211 mlx5_notifier_unregister(mvdev->mdev, &ndev->nb); ··· 3325 3243 mgtdev->mgtdev.id_table = id_table; 3326 3244 mgtdev->mgtdev.config_attr_mask = BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR) | 3327 3245 BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP) | 3328 - BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU); 3246 + BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU) | 3247 + BIT_ULL(VDPA_ATTR_DEV_FEATURES); 3329 3248 mgtdev->mgtdev.max_supported_vqs = 3330 3249 MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues) + 1; 3331 3250 mgtdev->mgtdev.supported_features = get_supported_features(mdev);

+94

drivers/vdpa/mlx5/net/mlx5_vnet.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ 2 + /* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */ 3 + 4 + #ifndef __MLX5_VNET_H__ 5 + #define __MLX5_VNET_H__ 6 + 7 + #include "mlx5_vdpa.h" 8 + 9 + #define to_mlx5_vdpa_ndev(__mvdev) \ 10 + container_of(__mvdev, struct mlx5_vdpa_net, mvdev) 11 + #define to_mvdev(__vdev) container_of((__vdev), struct mlx5_vdpa_dev, vdev) 12 + 13 + struct mlx5_vdpa_net_resources { 14 + u32 tisn; 15 + u32 tdn; 16 + u32 tirn; 17 + u32 rqtn; 18 + bool valid; 19 + struct dentry *tirn_dent; 20 + }; 21 + 22 + #define MLX5V_MACVLAN_SIZE 256 23 + 24 + static inline u16 key2vid(u64 key) 25 + { 26 + return (u16)(key >> 48) & 0xfff; 27 + } 28 + 29 + struct mlx5_vdpa_net { 30 + struct mlx5_vdpa_dev mvdev; 31 + struct mlx5_vdpa_net_resources res; 32 + struct virtio_net_config config; 33 + struct mlx5_vdpa_virtqueue *vqs; 34 + struct vdpa_callback *event_cbs; 35 + 36 + /* Serialize vq resources creation and destruction. This is required 37 + * since memory map might change and we need to destroy and create 38 + * resources while driver in operational. 39 + */ 40 + struct rw_semaphore reslock; 41 + struct mlx5_flow_table *rxft; 42 + struct dentry *rx_dent; 43 + struct dentry *rx_table_dent; 44 + bool setup; 45 + u32 cur_num_vqs; 46 + u32 rqt_size; 47 + bool nb_registered; 48 + struct notifier_block nb; 49 + struct vdpa_callback config_cb; 50 + struct mlx5_vdpa_wq_ent cvq_ent; 51 + struct hlist_head macvlan_hash[MLX5V_MACVLAN_SIZE]; 52 + struct dentry *debugfs; 53 + }; 54 + 55 + struct mlx5_vdpa_counter { 56 + struct mlx5_fc *counter; 57 + struct dentry *dent; 58 + struct mlx5_core_dev *mdev; 59 + }; 60 + 61 + struct macvlan_node { 62 + struct hlist_node hlist; 63 + struct mlx5_flow_handle *ucast_rule; 64 + struct mlx5_flow_handle *mcast_rule; 65 + u64 macvlan; 66 + struct mlx5_vdpa_net *ndev; 67 + bool tagged; 68 + #if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) 69 + struct dentry *dent; 70 + struct mlx5_vdpa_counter ucast_counter; 71 + struct mlx5_vdpa_counter mcast_counter; 72 + #endif 73 + }; 74 + 75 + void mlx5_vdpa_add_debugfs(struct mlx5_vdpa_net *ndev); 76 + void mlx5_vdpa_remove_debugfs(struct dentry *dbg); 77 + void mlx5_vdpa_add_rx_flow_table(struct mlx5_vdpa_net *ndev); 78 + void mlx5_vdpa_remove_rx_flow_table(struct mlx5_vdpa_net *ndev); 79 + void mlx5_vdpa_add_tirn(struct mlx5_vdpa_net *ndev); 80 + void mlx5_vdpa_remove_tirn(struct mlx5_vdpa_net *ndev); 81 + #if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) 82 + void mlx5_vdpa_add_rx_counters(struct mlx5_vdpa_net *ndev, 83 + struct macvlan_node *node); 84 + void mlx5_vdpa_remove_rx_counters(struct mlx5_vdpa_net *ndev, 85 + struct macvlan_node *node); 86 + #else 87 + static inline void mlx5_vdpa_add_rx_counters(struct mlx5_vdpa_net *ndev, 88 + struct macvlan_node *node) {} 89 + static inline void mlx5_vdpa_remove_rx_counters(struct mlx5_vdpa_net *ndev, 90 + struct macvlan_node *node) {} 91 + #endif 92 + 93 + 94 + #endif /* __MLX5_VNET_H__ */

+6

drivers/vdpa/solidrun/Makefile

··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + obj-$(CONFIG_SNET_VDPA) += snet_vdpa.o 3 + snet_vdpa-$(CONFIG_SNET_VDPA) += snet_main.o 4 + ifdef CONFIG_HWMON 5 + snet_vdpa-$(CONFIG_SNET_VDPA) += snet_hwmon.o 6 + endif

+188

drivers/vdpa/solidrun/snet_hwmon.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * SolidRun DPU driver for control plane 4 + * 5 + * Copyright (C) 2022 SolidRun 6 + * 7 + * Author: Alvaro Karsz <alvaro.karsz@solid-run.com> 8 + * 9 + */ 10 + #include <linux/hwmon.h> 11 + 12 + #include "snet_vdpa.h" 13 + 14 + /* Monitor offsets */ 15 + #define SNET_MON_TMP0_IN_OFF 0x00 16 + #define SNET_MON_TMP0_MAX_OFF 0x08 17 + #define SNET_MON_TMP0_CRIT_OFF 0x10 18 + #define SNET_MON_TMP1_IN_OFF 0x18 19 + #define SNET_MON_TMP1_CRIT_OFF 0x20 20 + #define SNET_MON_CURR_IN_OFF 0x28 21 + #define SNET_MON_CURR_MAX_OFF 0x30 22 + #define SNET_MON_CURR_CRIT_OFF 0x38 23 + #define SNET_MON_PWR_IN_OFF 0x40 24 + #define SNET_MON_VOLT_IN_OFF 0x48 25 + #define SNET_MON_VOLT_CRIT_OFF 0x50 26 + #define SNET_MON_VOLT_LCRIT_OFF 0x58 27 + 28 + static void snet_hwmon_read_reg(struct psnet *psnet, u32 reg, long *out) 29 + { 30 + *out = psnet_read64(psnet, psnet->cfg.hwmon_off + reg); 31 + } 32 + 33 + static umode_t snet_howmon_is_visible(const void *data, 34 + enum hwmon_sensor_types type, 35 + u32 attr, int channel) 36 + { 37 + return 0444; 38 + } 39 + 40 + static int snet_howmon_read(struct device *dev, enum hwmon_sensor_types type, 41 + u32 attr, int channel, long *val) 42 + { 43 + struct psnet *psnet = dev_get_drvdata(dev); 44 + int ret = 0; 45 + 46 + switch (type) { 47 + case hwmon_in: 48 + switch (attr) { 49 + case hwmon_in_lcrit: 50 + snet_hwmon_read_reg(psnet, SNET_MON_VOLT_LCRIT_OFF, val); 51 + break; 52 + case hwmon_in_crit: 53 + snet_hwmon_read_reg(psnet, SNET_MON_VOLT_CRIT_OFF, val); 54 + break; 55 + case hwmon_in_input: 56 + snet_hwmon_read_reg(psnet, SNET_MON_VOLT_IN_OFF, val); 57 + break; 58 + default: 59 + ret = -EOPNOTSUPP; 60 + break; 61 + } 62 + break; 63 + 64 + case hwmon_power: 65 + switch (attr) { 66 + case hwmon_power_input: 67 + snet_hwmon_read_reg(psnet, SNET_MON_PWR_IN_OFF, val); 68 + break; 69 + 70 + default: 71 + ret = -EOPNOTSUPP; 72 + break; 73 + } 74 + break; 75 + 76 + case hwmon_curr: 77 + switch (attr) { 78 + case hwmon_curr_input: 79 + snet_hwmon_read_reg(psnet, SNET_MON_CURR_IN_OFF, val); 80 + break; 81 + case hwmon_curr_max: 82 + snet_hwmon_read_reg(psnet, SNET_MON_CURR_MAX_OFF, val); 83 + break; 84 + case hwmon_curr_crit: 85 + snet_hwmon_read_reg(psnet, SNET_MON_CURR_CRIT_OFF, val); 86 + break; 87 + default: 88 + ret = -EOPNOTSUPP; 89 + break; 90 + } 91 + break; 92 + 93 + case hwmon_temp: 94 + switch (attr) { 95 + case hwmon_temp_input: 96 + if (channel == 0) 97 + snet_hwmon_read_reg(psnet, SNET_MON_TMP0_IN_OFF, val); 98 + else 99 + snet_hwmon_read_reg(psnet, SNET_MON_TMP1_IN_OFF, val); 100 + break; 101 + case hwmon_temp_max: 102 + if (channel == 0) 103 + snet_hwmon_read_reg(psnet, SNET_MON_TMP0_MAX_OFF, val); 104 + else 105 + ret = -EOPNOTSUPP; 106 + break; 107 + case hwmon_temp_crit: 108 + if (channel == 0) 109 + snet_hwmon_read_reg(psnet, SNET_MON_TMP0_CRIT_OFF, val); 110 + else 111 + snet_hwmon_read_reg(psnet, SNET_MON_TMP1_CRIT_OFF, val); 112 + break; 113 + 114 + default: 115 + ret = -EOPNOTSUPP; 116 + break; 117 + } 118 + break; 119 + 120 + default: 121 + ret = -EOPNOTSUPP; 122 + break; 123 + } 124 + return ret; 125 + } 126 + 127 + static int snet_hwmon_read_string(struct device *dev, 128 + enum hwmon_sensor_types type, u32 attr, 129 + int channel, const char **str) 130 + { 131 + int ret = 0; 132 + 133 + switch (type) { 134 + case hwmon_in: 135 + *str = "main_vin"; 136 + break; 137 + case hwmon_power: 138 + *str = "soc_pin"; 139 + break; 140 + case hwmon_curr: 141 + *str = "soc_iin"; 142 + break; 143 + case hwmon_temp: 144 + if (channel == 0) 145 + *str = "power_stage_temp"; 146 + else 147 + *str = "ic_junction_temp"; 148 + break; 149 + default: 150 + ret = -EOPNOTSUPP; 151 + break; 152 + } 153 + return ret; 154 + } 155 + 156 + static const struct hwmon_ops snet_hwmon_ops = { 157 + .is_visible = snet_howmon_is_visible, 158 + .read = snet_howmon_read, 159 + .read_string = snet_hwmon_read_string 160 + }; 161 + 162 + static const struct hwmon_channel_info *snet_hwmon_info[] = { 163 + HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_LABEL, 164 + HWMON_T_INPUT | HWMON_T_CRIT | HWMON_T_LABEL), 165 + HWMON_CHANNEL_INFO(power, HWMON_P_INPUT | HWMON_P_LABEL), 166 + HWMON_CHANNEL_INFO(curr, HWMON_C_INPUT | HWMON_C_MAX | HWMON_C_CRIT | HWMON_C_LABEL), 167 + HWMON_CHANNEL_INFO(in, HWMON_I_INPUT | HWMON_I_CRIT | HWMON_I_LCRIT | HWMON_I_LABEL), 168 + NULL 169 + }; 170 + 171 + static const struct hwmon_chip_info snet_hwmono_info = { 172 + .ops = &snet_hwmon_ops, 173 + .info = snet_hwmon_info, 174 + }; 175 + 176 + /* Create an HW monitor device */ 177 + void psnet_create_hwmon(struct pci_dev *pdev) 178 + { 179 + struct device *hwmon; 180 + struct psnet *psnet = pci_get_drvdata(pdev); 181 + 182 + snprintf(psnet->hwmon_name, SNET_NAME_SIZE, "snet_%s", pci_name(pdev)); 183 + hwmon = devm_hwmon_device_register_with_info(&pdev->dev, psnet->hwmon_name, psnet, 184 + &snet_hwmono_info, NULL); 185 + /* The monitor is not mandatory, Just alert user in case of an error */ 186 + if (IS_ERR(hwmon)) 187 + SNET_WARN(pdev, "Failed to create SNET hwmon, error %ld\n", PTR_ERR(hwmon)); 188 + }

+1111

drivers/vdpa/solidrun/snet_main.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * SolidRun DPU driver for control plane 4 + * 5 + * Copyright (C) 2022 SolidRun 6 + * 7 + * Author: Alvaro Karsz <alvaro.karsz@solid-run.com> 8 + * 9 + */ 10 + #include <linux/iopoll.h> 11 + 12 + #include "snet_vdpa.h" 13 + 14 + /* SNET DPU device ID */ 15 + #define SNET_DEVICE_ID 0x1000 16 + /* SNET signature */ 17 + #define SNET_SIGNATURE 0xD0D06363 18 + /* Max. config version that we can work with */ 19 + #define SNET_CFG_VERSION 0x1 20 + /* Queue align */ 21 + #define SNET_QUEUE_ALIGNMENT PAGE_SIZE 22 + /* Kick value to notify that new data is available */ 23 + #define SNET_KICK_VAL 0x1 24 + #define SNET_CONFIG_OFF 0x0 25 + /* ACK timeout for a message */ 26 + #define SNET_ACK_TIMEOUT 2000000 27 + /* How long we are willing to wait for a SNET device */ 28 + #define SNET_DETECT_TIMEOUT 5000000 29 + /* How long should we wait for the DPU to read our config */ 30 + #define SNET_READ_CFG_TIMEOUT 3000000 31 + /* Size of configs written to the DPU */ 32 + #define SNET_GENERAL_CFG_LEN 36 33 + #define SNET_GENERAL_CFG_VQ_LEN 40 34 + 35 + enum snet_msg { 36 + SNET_MSG_DESTROY = 1, 37 + }; 38 + 39 + static struct snet *vdpa_to_snet(struct vdpa_device *vdpa) 40 + { 41 + return container_of(vdpa, struct snet, vdpa); 42 + } 43 + 44 + static int snet_wait_for_msg_ack(struct snet *snet) 45 + { 46 + struct pci_dev *pdev = snet->pdev; 47 + int ret; 48 + u32 val; 49 + 50 + /* The DPU will clear the messages offset once messages 51 + * are processed. 52 + */ 53 + ret = readx_poll_timeout(ioread32, snet->bar + snet->psnet->cfg.msg_off, 54 + val, !val, 10, SNET_ACK_TIMEOUT); 55 + if (ret) 56 + SNET_WARN(pdev, "Timeout waiting for message ACK\n"); 57 + 58 + return ret; 59 + } 60 + 61 + /* Sends a message to the DPU. 62 + * If blocking is set, the function will return once the 63 + * message was processed by the DPU (or timeout). 64 + */ 65 + static int snet_send_msg(struct snet *snet, u32 msg, bool blocking) 66 + { 67 + int ret = 0; 68 + 69 + /* Make sure the DPU acked last message before issuing a new one */ 70 + ret = snet_wait_for_msg_ack(snet); 71 + if (ret) 72 + return ret; 73 + 74 + /* Write the message */ 75 + snet_write32(snet, snet->psnet->cfg.msg_off, msg); 76 + 77 + if (blocking) 78 + ret = snet_wait_for_msg_ack(snet); 79 + else /* If non-blocking, flush the write by issuing a read */ 80 + snet_read32(snet, snet->psnet->cfg.msg_off); 81 + 82 + return ret; 83 + } 84 + 85 + static irqreturn_t snet_cfg_irq_hndlr(int irq, void *data) 86 + { 87 + struct snet *snet = data; 88 + /* Call callback if any */ 89 + if (snet->cb.callback) 90 + return snet->cb.callback(snet->cb.private); 91 + 92 + return IRQ_HANDLED; 93 + } 94 + 95 + static irqreturn_t snet_vq_irq_hndlr(int irq, void *data) 96 + { 97 + struct snet_vq *vq = data; 98 + /* Call callback if any */ 99 + if (vq->cb.callback) 100 + return vq->cb.callback(vq->cb.private); 101 + 102 + return IRQ_HANDLED; 103 + } 104 + 105 + static void snet_free_irqs(struct snet *snet) 106 + { 107 + struct psnet *psnet = snet->psnet; 108 + struct pci_dev *pdev; 109 + u32 i; 110 + 111 + /* Which Device allcoated the IRQs? */ 112 + if (PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_IRQ_PF)) 113 + pdev = snet->pdev->physfn; 114 + else 115 + pdev = snet->pdev; 116 + 117 + /* Free config's IRQ */ 118 + if (snet->cfg_irq != -1) { 119 + devm_free_irq(&pdev->dev, snet->cfg_irq, snet); 120 + snet->cfg_irq = -1; 121 + } 122 + /* Free VQ IRQs */ 123 + for (i = 0; i < snet->cfg->vq_num; i++) { 124 + if (snet->vqs[i] && snet->vqs[i]->irq != -1) { 125 + devm_free_irq(&pdev->dev, snet->vqs[i]->irq, snet->vqs[i]); 126 + snet->vqs[i]->irq = -1; 127 + } 128 + } 129 + 130 + /* IRQ vectors are freed when the pci remove callback is called */ 131 + } 132 + 133 + static int snet_set_vq_address(struct vdpa_device *vdev, u16 idx, u64 desc_area, 134 + u64 driver_area, u64 device_area) 135 + { 136 + struct snet *snet = vdpa_to_snet(vdev); 137 + /* save received parameters in vqueue sturct */ 138 + snet->vqs[idx]->desc_area = desc_area; 139 + snet->vqs[idx]->driver_area = driver_area; 140 + snet->vqs[idx]->device_area = device_area; 141 + 142 + return 0; 143 + } 144 + 145 + static void snet_set_vq_num(struct vdpa_device *vdev, u16 idx, u32 num) 146 + { 147 + struct snet *snet = vdpa_to_snet(vdev); 148 + /* save num in vqueue */ 149 + snet->vqs[idx]->num = num; 150 + } 151 + 152 + static void snet_kick_vq(struct vdpa_device *vdev, u16 idx) 153 + { 154 + struct snet *snet = vdpa_to_snet(vdev); 155 + /* not ready - ignore */ 156 + if (!snet->vqs[idx]->ready) 157 + return; 158 + 159 + iowrite32(SNET_KICK_VAL, snet->vqs[idx]->kick_ptr); 160 + } 161 + 162 + static void snet_set_vq_cb(struct vdpa_device *vdev, u16 idx, struct vdpa_callback *cb) 163 + { 164 + struct snet *snet = vdpa_to_snet(vdev); 165 + 166 + snet->vqs[idx]->cb.callback = cb->callback; 167 + snet->vqs[idx]->cb.private = cb->private; 168 + } 169 + 170 + static void snet_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready) 171 + { 172 + struct snet *snet = vdpa_to_snet(vdev); 173 + 174 + snet->vqs[idx]->ready = ready; 175 + } 176 + 177 + static bool snet_get_vq_ready(struct vdpa_device *vdev, u16 idx) 178 + { 179 + struct snet *snet = vdpa_to_snet(vdev); 180 + 181 + return snet->vqs[idx]->ready; 182 + } 183 + 184 + static int snet_set_vq_state(struct vdpa_device *vdev, u16 idx, const struct vdpa_vq_state *state) 185 + { 186 + struct snet *snet = vdpa_to_snet(vdev); 187 + /* Setting the VQ state is not supported. 188 + * If the asked state is the same as the initial one 189 + * we can ignore it. 190 + */ 191 + if (SNET_HAS_FEATURE(snet, VIRTIO_F_RING_PACKED)) { 192 + const struct vdpa_vq_state_packed *p = &state->packed; 193 + 194 + if (p->last_avail_counter == 1 && p->last_used_counter == 1 && 195 + p->last_avail_idx == 0 && p->last_used_idx == 0) 196 + return 0; 197 + } else { 198 + const struct vdpa_vq_state_split *s = &state->split; 199 + 200 + if (s->avail_index == 0) 201 + return 0; 202 + } 203 + 204 + return -EOPNOTSUPP; 205 + } 206 + 207 + static int snet_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa_vq_state *state) 208 + { 209 + /* Not supported */ 210 + return -EOPNOTSUPP; 211 + } 212 + 213 + static int snet_get_vq_irq(struct vdpa_device *vdev, u16 idx) 214 + { 215 + struct snet *snet = vdpa_to_snet(vdev); 216 + 217 + return snet->vqs[idx]->irq; 218 + } 219 + 220 + static u32 snet_get_vq_align(struct vdpa_device *vdev) 221 + { 222 + return (u32)SNET_QUEUE_ALIGNMENT; 223 + } 224 + 225 + static int snet_reset_dev(struct snet *snet) 226 + { 227 + struct pci_dev *pdev = snet->pdev; 228 + int ret = 0; 229 + u32 i; 230 + 231 + /* If status is 0, nothing to do */ 232 + if (!snet->status) 233 + return 0; 234 + 235 + /* If DPU started, send a destroy message */ 236 + if (snet->status & VIRTIO_CONFIG_S_DRIVER_OK) 237 + ret = snet_send_msg(snet, SNET_MSG_DESTROY, true); 238 + 239 + /* Clear VQs */ 240 + for (i = 0; i < snet->cfg->vq_num; i++) { 241 + if (!snet->vqs[i]) 242 + continue; 243 + snet->vqs[i]->cb.callback = NULL; 244 + snet->vqs[i]->cb.private = NULL; 245 + snet->vqs[i]->desc_area = 0; 246 + snet->vqs[i]->device_area = 0; 247 + snet->vqs[i]->driver_area = 0; 248 + snet->vqs[i]->ready = false; 249 + } 250 + 251 + /* Clear config callback */ 252 + snet->cb.callback = NULL; 253 + snet->cb.private = NULL; 254 + /* Free IRQs */ 255 + snet_free_irqs(snet); 256 + /* Reset status */ 257 + snet->status = 0; 258 + snet->dpu_ready = false; 259 + 260 + if (ret) 261 + SNET_WARN(pdev, "Incomplete reset to SNET[%u] device\n", snet->sid); 262 + else 263 + SNET_DBG(pdev, "Reset SNET[%u] device\n", snet->sid); 264 + 265 + return 0; 266 + } 267 + 268 + static int snet_reset(struct vdpa_device *vdev) 269 + { 270 + struct snet *snet = vdpa_to_snet(vdev); 271 + 272 + return snet_reset_dev(snet); 273 + } 274 + 275 + static size_t snet_get_config_size(struct vdpa_device *vdev) 276 + { 277 + struct snet *snet = vdpa_to_snet(vdev); 278 + 279 + return (size_t)snet->cfg->cfg_size; 280 + } 281 + 282 + static u64 snet_get_features(struct vdpa_device *vdev) 283 + { 284 + struct snet *snet = vdpa_to_snet(vdev); 285 + 286 + return snet->cfg->features; 287 + } 288 + 289 + static int snet_set_drv_features(struct vdpa_device *vdev, u64 features) 290 + { 291 + struct snet *snet = vdpa_to_snet(vdev); 292 + 293 + snet->negotiated_features = snet->cfg->features & features; 294 + return 0; 295 + } 296 + 297 + static u64 snet_get_drv_features(struct vdpa_device *vdev) 298 + { 299 + struct snet *snet = vdpa_to_snet(vdev); 300 + 301 + return snet->negotiated_features; 302 + } 303 + 304 + static u16 snet_get_vq_num_max(struct vdpa_device *vdev) 305 + { 306 + struct snet *snet = vdpa_to_snet(vdev); 307 + 308 + return (u16)snet->cfg->vq_size; 309 + } 310 + 311 + static void snet_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb) 312 + { 313 + struct snet *snet = vdpa_to_snet(vdev); 314 + 315 + snet->cb.callback = cb->callback; 316 + snet->cb.private = cb->private; 317 + } 318 + 319 + static u32 snet_get_device_id(struct vdpa_device *vdev) 320 + { 321 + struct snet *snet = vdpa_to_snet(vdev); 322 + 323 + return snet->cfg->virtio_id; 324 + } 325 + 326 + static u32 snet_get_vendor_id(struct vdpa_device *vdev) 327 + { 328 + return (u32)PCI_VENDOR_ID_SOLIDRUN; 329 + } 330 + 331 + static u8 snet_get_status(struct vdpa_device *vdev) 332 + { 333 + struct snet *snet = vdpa_to_snet(vdev); 334 + 335 + return snet->status; 336 + } 337 + 338 + static int snet_write_conf(struct snet *snet) 339 + { 340 + u32 off, i, tmp; 341 + int ret; 342 + 343 + /* No need to write the config twice */ 344 + if (snet->dpu_ready) 345 + return true; 346 + 347 + /* Snet data : 348 + * 349 + * General data: SNET_GENERAL_CFG_LEN bytes long 350 + * 0 0x4 0x8 0xC 0x10 0x14 0x1C 0x24 351 + * | MAGIC NUMBER | CFG VER | SNET SID | NUMBER OF QUEUES | IRQ IDX | FEATURES | RSVD | 352 + * 353 + * For every VQ: SNET_GENERAL_CFG_VQ_LEN bytes long 354 + * 0 0x4 0x8 355 + * | VQ SID AND QUEUE SIZE | IRQ Index | 356 + * | DESC AREA | 357 + * | DEVICE AREA | 358 + * | DRIVER AREA | 359 + * | RESERVED | 360 + * 361 + * Magic number should be written last, this is the DPU indication that the data is ready 362 + */ 363 + 364 + /* Init offset */ 365 + off = snet->psnet->cfg.host_cfg_off; 366 + 367 + /* Ignore magic number for now */ 368 + off += 4; 369 + snet_write32(snet, off, snet->psnet->negotiated_cfg_ver); 370 + off += 4; 371 + snet_write32(snet, off, snet->sid); 372 + off += 4; 373 + snet_write32(snet, off, snet->cfg->vq_num); 374 + off += 4; 375 + snet_write32(snet, off, snet->cfg_irq_idx); 376 + off += 4; 377 + snet_write64(snet, off, snet->negotiated_features); 378 + off += 8; 379 + /* Ignore reserved */ 380 + off += 8; 381 + /* Write VQs */ 382 + for (i = 0 ; i < snet->cfg->vq_num ; i++) { 383 + tmp = (i << 16) | (snet->vqs[i]->num & 0xFFFF); 384 + snet_write32(snet, off, tmp); 385 + off += 4; 386 + snet_write32(snet, off, snet->vqs[i]->irq_idx); 387 + off += 4; 388 + snet_write64(snet, off, snet->vqs[i]->desc_area); 389 + off += 8; 390 + snet_write64(snet, off, snet->vqs[i]->device_area); 391 + off += 8; 392 + snet_write64(snet, off, snet->vqs[i]->driver_area); 393 + off += 8; 394 + /* Ignore reserved */ 395 + off += 8; 396 + } 397 + 398 + /* Clear snet messages address for this device */ 399 + snet_write32(snet, snet->psnet->cfg.msg_off, 0); 400 + /* Write magic number - data is ready */ 401 + snet_write32(snet, snet->psnet->cfg.host_cfg_off, SNET_SIGNATURE); 402 + 403 + /* The DPU will ACK the config by clearing the signature */ 404 + ret = readx_poll_timeout(ioread32, snet->bar + snet->psnet->cfg.host_cfg_off, 405 + tmp, !tmp, 10, SNET_READ_CFG_TIMEOUT); 406 + if (ret) { 407 + SNET_ERR(snet->pdev, "Timeout waiting for the DPU to read the config\n"); 408 + return false; 409 + } 410 + 411 + /* set DPU flag */ 412 + snet->dpu_ready = true; 413 + 414 + return true; 415 + } 416 + 417 + static int snet_request_irqs(struct pci_dev *pdev, struct snet *snet) 418 + { 419 + int ret, i, irq; 420 + 421 + /* Request config IRQ */ 422 + irq = pci_irq_vector(pdev, snet->cfg_irq_idx); 423 + ret = devm_request_irq(&pdev->dev, irq, snet_cfg_irq_hndlr, 0, 424 + snet->cfg_irq_name, snet); 425 + if (ret) { 426 + SNET_ERR(pdev, "Failed to request IRQ\n"); 427 + return ret; 428 + } 429 + snet->cfg_irq = irq; 430 + 431 + /* Request IRQ for every VQ */ 432 + for (i = 0; i < snet->cfg->vq_num; i++) { 433 + irq = pci_irq_vector(pdev, snet->vqs[i]->irq_idx); 434 + ret = devm_request_irq(&pdev->dev, irq, snet_vq_irq_hndlr, 0, 435 + snet->vqs[i]->irq_name, snet->vqs[i]); 436 + if (ret) { 437 + SNET_ERR(pdev, "Failed to request IRQ\n"); 438 + return ret; 439 + } 440 + snet->vqs[i]->irq = irq; 441 + } 442 + return 0; 443 + } 444 + 445 + static void snet_set_status(struct vdpa_device *vdev, u8 status) 446 + { 447 + struct snet *snet = vdpa_to_snet(vdev); 448 + struct psnet *psnet = snet->psnet; 449 + struct pci_dev *pdev = snet->pdev; 450 + int ret; 451 + bool pf_irqs; 452 + 453 + if (status == snet->status) 454 + return; 455 + 456 + if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && 457 + !(snet->status & VIRTIO_CONFIG_S_DRIVER_OK)) { 458 + /* Request IRQs */ 459 + pf_irqs = PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_IRQ_PF); 460 + ret = snet_request_irqs(pf_irqs ? pdev->physfn : pdev, snet); 461 + if (ret) 462 + goto set_err; 463 + 464 + /* Write config to the DPU */ 465 + if (snet_write_conf(snet)) { 466 + SNET_INFO(pdev, "Create SNET[%u] device\n", snet->sid); 467 + } else { 468 + snet_free_irqs(snet); 469 + goto set_err; 470 + } 471 + } 472 + 473 + /* Save the new status */ 474 + snet->status = status; 475 + return; 476 + 477 + set_err: 478 + snet->status |= VIRTIO_CONFIG_S_FAILED; 479 + } 480 + 481 + static void snet_get_config(struct vdpa_device *vdev, unsigned int offset, 482 + void *buf, unsigned int len) 483 + { 484 + struct snet *snet = vdpa_to_snet(vdev); 485 + void __iomem *cfg_ptr = snet->cfg->virtio_cfg + offset; 486 + u8 *buf_ptr = buf; 487 + u32 i; 488 + 489 + /* check for offset error */ 490 + if (offset + len > snet->cfg->cfg_size) 491 + return; 492 + 493 + /* Write into buffer */ 494 + for (i = 0; i < len; i++) 495 + *buf_ptr++ = ioread8(cfg_ptr + i); 496 + } 497 + 498 + static void snet_set_config(struct vdpa_device *vdev, unsigned int offset, 499 + const void *buf, unsigned int len) 500 + { 501 + struct snet *snet = vdpa_to_snet(vdev); 502 + void __iomem *cfg_ptr = snet->cfg->virtio_cfg + offset; 503 + const u8 *buf_ptr = buf; 504 + u32 i; 505 + 506 + /* check for offset error */ 507 + if (offset + len > snet->cfg->cfg_size) 508 + return; 509 + 510 + /* Write into PCI BAR */ 511 + for (i = 0; i < len; i++) 512 + iowrite8(*buf_ptr++, cfg_ptr + i); 513 + } 514 + 515 + static const struct vdpa_config_ops snet_config_ops = { 516 + .set_vq_address = snet_set_vq_address, 517 + .set_vq_num = snet_set_vq_num, 518 + .kick_vq = snet_kick_vq, 519 + .set_vq_cb = snet_set_vq_cb, 520 + .set_vq_ready = snet_set_vq_ready, 521 + .get_vq_ready = snet_get_vq_ready, 522 + .set_vq_state = snet_set_vq_state, 523 + .get_vq_state = snet_get_vq_state, 524 + .get_vq_irq = snet_get_vq_irq, 525 + .get_vq_align = snet_get_vq_align, 526 + .reset = snet_reset, 527 + .get_config_size = snet_get_config_size, 528 + .get_device_features = snet_get_features, 529 + .set_driver_features = snet_set_drv_features, 530 + .get_driver_features = snet_get_drv_features, 531 + .get_vq_num_min = snet_get_vq_num_max, 532 + .get_vq_num_max = snet_get_vq_num_max, 533 + .set_config_cb = snet_set_config_cb, 534 + .get_device_id = snet_get_device_id, 535 + .get_vendor_id = snet_get_vendor_id, 536 + .get_status = snet_get_status, 537 + .set_status = snet_set_status, 538 + .get_config = snet_get_config, 539 + .set_config = snet_set_config, 540 + }; 541 + 542 + static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet) 543 + { 544 + char name[50]; 545 + int ret, i, mask = 0; 546 + /* We don't know which BAR will be used to communicate.. 547 + * We will map every bar with len > 0. 548 + * 549 + * Later, we will discover the BAR and unmap all other BARs. 550 + */ 551 + for (i = 0; i < PCI_STD_NUM_BARS; i++) { 552 + if (pci_resource_len(pdev, i)) 553 + mask |= (1 << i); 554 + } 555 + 556 + /* No BAR can be used.. */ 557 + if (!mask) { 558 + SNET_ERR(pdev, "Failed to find a PCI BAR\n"); 559 + return -ENODEV; 560 + } 561 + 562 + snprintf(name, sizeof(name), "psnet[%s]-bars", pci_name(pdev)); 563 + ret = pcim_iomap_regions(pdev, mask, name); 564 + if (ret) { 565 + SNET_ERR(pdev, "Failed to request and map PCI BARs\n"); 566 + return ret; 567 + } 568 + 569 + for (i = 0; i < PCI_STD_NUM_BARS; i++) { 570 + if (mask & (1 << i)) 571 + psnet->bars[i] = pcim_iomap_table(pdev)[i]; 572 + } 573 + 574 + return 0; 575 + } 576 + 577 + static int snet_open_vf_bar(struct pci_dev *pdev, struct snet *snet) 578 + { 579 + char name[50]; 580 + int ret; 581 + 582 + snprintf(name, sizeof(name), "snet[%s]-bar", pci_name(pdev)); 583 + /* Request and map BAR */ 584 + ret = pcim_iomap_regions(pdev, BIT(snet->psnet->cfg.vf_bar), name); 585 + if (ret) { 586 + SNET_ERR(pdev, "Failed to request and map PCI BAR for a VF\n"); 587 + return ret; 588 + } 589 + 590 + snet->bar = pcim_iomap_table(pdev)[snet->psnet->cfg.vf_bar]; 591 + 592 + return 0; 593 + } 594 + 595 + static void snet_free_cfg(struct snet_cfg *cfg) 596 + { 597 + u32 i; 598 + 599 + if (!cfg->devs) 600 + return; 601 + 602 + /* Free devices */ 603 + for (i = 0; i < cfg->devices_num; i++) { 604 + if (!cfg->devs[i]) 605 + break; 606 + 607 + kfree(cfg->devs[i]); 608 + } 609 + /* Free pointers to devices */ 610 + kfree(cfg->devs); 611 + } 612 + 613 + /* Detect which BAR is used for communication with the device. */ 614 + static int psnet_detect_bar(struct psnet *psnet, u32 off) 615 + { 616 + unsigned long exit_time; 617 + int i; 618 + 619 + exit_time = jiffies + usecs_to_jiffies(SNET_DETECT_TIMEOUT); 620 + 621 + /* SNET DPU will write SNET's signature when the config is ready. */ 622 + while (time_before(jiffies, exit_time)) { 623 + for (i = 0; i < PCI_STD_NUM_BARS; i++) { 624 + /* Is this BAR mapped? */ 625 + if (!psnet->bars[i]) 626 + continue; 627 + 628 + if (ioread32(psnet->bars[i] + off) == SNET_SIGNATURE) 629 + return i; 630 + } 631 + usleep_range(1000, 10000); 632 + } 633 + 634 + return -ENODEV; 635 + } 636 + 637 + static void psnet_unmap_unused_bars(struct pci_dev *pdev, struct psnet *psnet) 638 + { 639 + int i, mask = 0; 640 + 641 + for (i = 0; i < PCI_STD_NUM_BARS; i++) { 642 + if (psnet->bars[i] && i != psnet->barno) 643 + mask |= (1 << i); 644 + } 645 + 646 + if (mask) 647 + pcim_iounmap_regions(pdev, mask); 648 + } 649 + 650 + /* Read SNET config from PCI BAR */ 651 + static int psnet_read_cfg(struct pci_dev *pdev, struct psnet *psnet) 652 + { 653 + struct snet_cfg *cfg = &psnet->cfg; 654 + u32 i, off; 655 + int barno; 656 + 657 + /* Move to where the config starts */ 658 + off = SNET_CONFIG_OFF; 659 + 660 + /* Find BAR used for communication */ 661 + barno = psnet_detect_bar(psnet, off); 662 + if (barno < 0) { 663 + SNET_ERR(pdev, "SNET config is not ready.\n"); 664 + return barno; 665 + } 666 + 667 + /* Save used BAR number and unmap all other BARs */ 668 + psnet->barno = barno; 669 + SNET_DBG(pdev, "Using BAR number %d\n", barno); 670 + 671 + psnet_unmap_unused_bars(pdev, psnet); 672 + 673 + /* load config from BAR */ 674 + cfg->key = psnet_read32(psnet, off); 675 + off += 4; 676 + cfg->cfg_size = psnet_read32(psnet, off); 677 + off += 4; 678 + cfg->cfg_ver = psnet_read32(psnet, off); 679 + off += 4; 680 + /* The negotiated config version is the lower one between this driver's config 681 + * and the DPU's. 682 + */ 683 + psnet->negotiated_cfg_ver = min_t(u32, cfg->cfg_ver, SNET_CFG_VERSION); 684 + SNET_DBG(pdev, "SNET config version %u\n", psnet->negotiated_cfg_ver); 685 + 686 + cfg->vf_num = psnet_read32(psnet, off); 687 + off += 4; 688 + cfg->vf_bar = psnet_read32(psnet, off); 689 + off += 4; 690 + cfg->host_cfg_off = psnet_read32(psnet, off); 691 + off += 4; 692 + cfg->max_size_host_cfg = psnet_read32(psnet, off); 693 + off += 4; 694 + cfg->virtio_cfg_off = psnet_read32(psnet, off); 695 + off += 4; 696 + cfg->kick_off = psnet_read32(psnet, off); 697 + off += 4; 698 + cfg->hwmon_off = psnet_read32(psnet, off); 699 + off += 4; 700 + cfg->msg_off = psnet_read32(psnet, off); 701 + off += 4; 702 + cfg->flags = psnet_read32(psnet, off); 703 + off += 4; 704 + /* Ignore Reserved */ 705 + off += sizeof(cfg->rsvd); 706 + 707 + cfg->devices_num = psnet_read32(psnet, off); 708 + off += 4; 709 + /* Allocate memory to hold pointer to the devices */ 710 + cfg->devs = kcalloc(cfg->devices_num, sizeof(void *), GFP_KERNEL); 711 + if (!cfg->devs) 712 + return -ENOMEM; 713 + 714 + /* Load device configuration from BAR */ 715 + for (i = 0; i < cfg->devices_num; i++) { 716 + cfg->devs[i] = kzalloc(sizeof(*cfg->devs[i]), GFP_KERNEL); 717 + if (!cfg->devs[i]) { 718 + snet_free_cfg(cfg); 719 + return -ENOMEM; 720 + } 721 + /* Read device config */ 722 + cfg->devs[i]->virtio_id = psnet_read32(psnet, off); 723 + off += 4; 724 + cfg->devs[i]->vq_num = psnet_read32(psnet, off); 725 + off += 4; 726 + cfg->devs[i]->vq_size = psnet_read32(psnet, off); 727 + off += 4; 728 + cfg->devs[i]->vfid = psnet_read32(psnet, off); 729 + off += 4; 730 + cfg->devs[i]->features = psnet_read64(psnet, off); 731 + off += 8; 732 + /* Ignore Reserved */ 733 + off += sizeof(cfg->devs[i]->rsvd); 734 + 735 + cfg->devs[i]->cfg_size = psnet_read32(psnet, off); 736 + off += 4; 737 + 738 + /* Is the config witten to the DPU going to be too big? */ 739 + if (SNET_GENERAL_CFG_LEN + SNET_GENERAL_CFG_VQ_LEN * cfg->devs[i]->vq_num > 740 + cfg->max_size_host_cfg) { 741 + SNET_ERR(pdev, "Failed to read SNET config, the config is too big..\n"); 742 + snet_free_cfg(cfg); 743 + return -EINVAL; 744 + } 745 + } 746 + return 0; 747 + } 748 + 749 + static int psnet_alloc_irq_vector(struct pci_dev *pdev, struct psnet *psnet) 750 + { 751 + int ret = 0; 752 + u32 i, irq_num = 0; 753 + 754 + /* Let's count how many IRQs we need, 1 for every VQ + 1 for config change */ 755 + for (i = 0; i < psnet->cfg.devices_num; i++) 756 + irq_num += psnet->cfg.devs[i]->vq_num + 1; 757 + 758 + ret = pci_alloc_irq_vectors(pdev, irq_num, irq_num, PCI_IRQ_MSIX); 759 + if (ret != irq_num) { 760 + SNET_ERR(pdev, "Failed to allocate IRQ vectors\n"); 761 + return ret; 762 + } 763 + SNET_DBG(pdev, "Allocated %u IRQ vectors from physical function\n", irq_num); 764 + 765 + return 0; 766 + } 767 + 768 + static int snet_alloc_irq_vector(struct pci_dev *pdev, struct snet_dev_cfg *snet_cfg) 769 + { 770 + int ret = 0; 771 + u32 irq_num; 772 + 773 + /* We want 1 IRQ for every VQ + 1 for config change events */ 774 + irq_num = snet_cfg->vq_num + 1; 775 + 776 + ret = pci_alloc_irq_vectors(pdev, irq_num, irq_num, PCI_IRQ_MSIX); 777 + if (ret <= 0) { 778 + SNET_ERR(pdev, "Failed to allocate IRQ vectors\n"); 779 + return ret; 780 + } 781 + 782 + return 0; 783 + } 784 + 785 + static void snet_free_vqs(struct snet *snet) 786 + { 787 + u32 i; 788 + 789 + if (!snet->vqs) 790 + return; 791 + 792 + for (i = 0 ; i < snet->cfg->vq_num ; i++) { 793 + if (!snet->vqs[i]) 794 + break; 795 + 796 + kfree(snet->vqs[i]); 797 + } 798 + kfree(snet->vqs); 799 + } 800 + 801 + static int snet_build_vqs(struct snet *snet) 802 + { 803 + u32 i; 804 + /* Allocate the VQ pointers array */ 805 + snet->vqs = kcalloc(snet->cfg->vq_num, sizeof(void *), GFP_KERNEL); 806 + if (!snet->vqs) 807 + return -ENOMEM; 808 + 809 + /* Allocate the VQs */ 810 + for (i = 0; i < snet->cfg->vq_num; i++) { 811 + snet->vqs[i] = kzalloc(sizeof(*snet->vqs[i]), GFP_KERNEL); 812 + if (!snet->vqs[i]) { 813 + snet_free_vqs(snet); 814 + return -ENOMEM; 815 + } 816 + /* Reset IRQ num */ 817 + snet->vqs[i]->irq = -1; 818 + /* VQ serial ID */ 819 + snet->vqs[i]->sid = i; 820 + /* Kick address - every VQ gets 4B */ 821 + snet->vqs[i]->kick_ptr = snet->bar + snet->psnet->cfg.kick_off + 822 + snet->vqs[i]->sid * 4; 823 + /* Clear kick address for this VQ */ 824 + iowrite32(0, snet->vqs[i]->kick_ptr); 825 + } 826 + return 0; 827 + } 828 + 829 + static int psnet_get_next_irq_num(struct psnet *psnet) 830 + { 831 + int irq; 832 + 833 + spin_lock(&psnet->lock); 834 + irq = psnet->next_irq++; 835 + spin_unlock(&psnet->lock); 836 + 837 + return irq; 838 + } 839 + 840 + static void snet_reserve_irq_idx(struct pci_dev *pdev, struct snet *snet) 841 + { 842 + struct psnet *psnet = snet->psnet; 843 + int i; 844 + 845 + /* one IRQ for every VQ, and one for config changes */ 846 + snet->cfg_irq_idx = psnet_get_next_irq_num(psnet); 847 + snprintf(snet->cfg_irq_name, SNET_NAME_SIZE, "snet[%s]-cfg[%d]", 848 + pci_name(pdev), snet->cfg_irq_idx); 849 + 850 + for (i = 0; i < snet->cfg->vq_num; i++) { 851 + /* Get next free IRQ ID */ 852 + snet->vqs[i]->irq_idx = psnet_get_next_irq_num(psnet); 853 + /* Write IRQ name */ 854 + snprintf(snet->vqs[i]->irq_name, SNET_NAME_SIZE, "snet[%s]-vq[%d]", 855 + pci_name(pdev), snet->vqs[i]->irq_idx); 856 + } 857 + } 858 + 859 + /* Find a device config based on virtual function id */ 860 + static struct snet_dev_cfg *snet_find_dev_cfg(struct snet_cfg *cfg, u32 vfid) 861 + { 862 + u32 i; 863 + 864 + for (i = 0; i < cfg->devices_num; i++) { 865 + if (cfg->devs[i]->vfid == vfid) 866 + return cfg->devs[i]; 867 + } 868 + /* Oppss.. no config found.. */ 869 + return NULL; 870 + } 871 + 872 + /* Probe function for a physical PCI function */ 873 + static int snet_vdpa_probe_pf(struct pci_dev *pdev) 874 + { 875 + struct psnet *psnet; 876 + int ret = 0; 877 + bool pf_irqs = false; 878 + 879 + ret = pcim_enable_device(pdev); 880 + if (ret) { 881 + SNET_ERR(pdev, "Failed to enable PCI device\n"); 882 + return ret; 883 + } 884 + 885 + /* Allocate a PCI physical function device */ 886 + psnet = kzalloc(sizeof(*psnet), GFP_KERNEL); 887 + if (!psnet) 888 + return -ENOMEM; 889 + 890 + /* Init PSNET spinlock */ 891 + spin_lock_init(&psnet->lock); 892 + 893 + pci_set_master(pdev); 894 + pci_set_drvdata(pdev, psnet); 895 + 896 + /* Open SNET MAIN BAR */ 897 + ret = psnet_open_pf_bar(pdev, psnet); 898 + if (ret) 899 + goto free_psnet; 900 + 901 + /* Try to read SNET's config from PCI BAR */ 902 + ret = psnet_read_cfg(pdev, psnet); 903 + if (ret) 904 + goto free_psnet; 905 + 906 + /* If SNET_CFG_FLAG_IRQ_PF flag is set, we should use 907 + * PF MSI-X vectors 908 + */ 909 + pf_irqs = PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_IRQ_PF); 910 + 911 + if (pf_irqs) { 912 + ret = psnet_alloc_irq_vector(pdev, psnet); 913 + if (ret) 914 + goto free_cfg; 915 + } 916 + 917 + SNET_DBG(pdev, "Enable %u virtual functions\n", psnet->cfg.vf_num); 918 + ret = pci_enable_sriov(pdev, psnet->cfg.vf_num); 919 + if (ret) { 920 + SNET_ERR(pdev, "Failed to enable SR-IOV\n"); 921 + goto free_irq; 922 + } 923 + 924 + /* Create HW monitor device */ 925 + if (PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_HWMON)) { 926 + #if IS_ENABLED(CONFIG_HWMON) 927 + psnet_create_hwmon(pdev); 928 + #else 929 + SNET_WARN(pdev, "Can't start HWMON, CONFIG_HWMON is not enabled\n"); 930 + #endif 931 + } 932 + 933 + return 0; 934 + 935 + free_irq: 936 + if (pf_irqs) 937 + pci_free_irq_vectors(pdev); 938 + free_cfg: 939 + snet_free_cfg(&psnet->cfg); 940 + free_psnet: 941 + kfree(psnet); 942 + return ret; 943 + } 944 + 945 + /* Probe function for a virtual PCI function */ 946 + static int snet_vdpa_probe_vf(struct pci_dev *pdev) 947 + { 948 + struct pci_dev *pdev_pf = pdev->physfn; 949 + struct psnet *psnet = pci_get_drvdata(pdev_pf); 950 + struct snet_dev_cfg *dev_cfg; 951 + struct snet *snet; 952 + u32 vfid; 953 + int ret; 954 + bool pf_irqs = false; 955 + 956 + /* Get virtual function id. 957 + * (the DPU counts the VFs from 1) 958 + */ 959 + ret = pci_iov_vf_id(pdev); 960 + if (ret < 0) { 961 + SNET_ERR(pdev, "Failed to find a VF id\n"); 962 + return ret; 963 + } 964 + vfid = ret + 1; 965 + 966 + /* Find the snet_dev_cfg based on vfid */ 967 + dev_cfg = snet_find_dev_cfg(&psnet->cfg, vfid); 968 + if (!dev_cfg) { 969 + SNET_WARN(pdev, "Failed to find a VF config..\n"); 970 + return -ENODEV; 971 + } 972 + 973 + /* Which PCI device should allocate the IRQs? 974 + * If the SNET_CFG_FLAG_IRQ_PF flag set, the PF device allocates the IRQs 975 + */ 976 + pf_irqs = PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_IRQ_PF); 977 + 978 + ret = pcim_enable_device(pdev); 979 + if (ret) { 980 + SNET_ERR(pdev, "Failed to enable PCI VF device\n"); 981 + return ret; 982 + } 983 + 984 + /* Request for MSI-X IRQs */ 985 + if (!pf_irqs) { 986 + ret = snet_alloc_irq_vector(pdev, dev_cfg); 987 + if (ret) 988 + return ret; 989 + } 990 + 991 + /* Allocate vdpa device */ 992 + snet = vdpa_alloc_device(struct snet, vdpa, &pdev->dev, &snet_config_ops, 1, 1, NULL, 993 + false); 994 + if (!snet) { 995 + SNET_ERR(pdev, "Failed to allocate a vdpa device\n"); 996 + ret = -ENOMEM; 997 + goto free_irqs; 998 + } 999 + 1000 + /* Save pci device pointer */ 1001 + snet->pdev = pdev; 1002 + snet->psnet = psnet; 1003 + snet->cfg = dev_cfg; 1004 + snet->dpu_ready = false; 1005 + snet->sid = vfid; 1006 + /* Reset IRQ value */ 1007 + snet->cfg_irq = -1; 1008 + 1009 + ret = snet_open_vf_bar(pdev, snet); 1010 + if (ret) 1011 + goto put_device; 1012 + 1013 + /* Create a VirtIO config pointer */ 1014 + snet->cfg->virtio_cfg = snet->bar + snet->psnet->cfg.virtio_cfg_off; 1015 + 1016 + pci_set_master(pdev); 1017 + pci_set_drvdata(pdev, snet); 1018 + 1019 + ret = snet_build_vqs(snet); 1020 + if (ret) 1021 + goto put_device; 1022 + 1023 + /* Reserve IRQ indexes, 1024 + * The IRQs may be requested and freed multiple times, 1025 + * but the indexes won't change. 1026 + */ 1027 + snet_reserve_irq_idx(pf_irqs ? pdev_pf : pdev, snet); 1028 + 1029 + /*set DMA device*/ 1030 + snet->vdpa.dma_dev = &pdev->dev; 1031 + 1032 + /* Register VDPA device */ 1033 + ret = vdpa_register_device(&snet->vdpa, snet->cfg->vq_num); 1034 + if (ret) { 1035 + SNET_ERR(pdev, "Failed to register vdpa device\n"); 1036 + goto free_vqs; 1037 + } 1038 + 1039 + return 0; 1040 + 1041 + free_vqs: 1042 + snet_free_vqs(snet); 1043 + put_device: 1044 + put_device(&snet->vdpa.dev); 1045 + free_irqs: 1046 + if (!pf_irqs) 1047 + pci_free_irq_vectors(pdev); 1048 + return ret; 1049 + } 1050 + 1051 + static int snet_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) 1052 + { 1053 + if (pdev->is_virtfn) 1054 + return snet_vdpa_probe_vf(pdev); 1055 + else 1056 + return snet_vdpa_probe_pf(pdev); 1057 + } 1058 + 1059 + static void snet_vdpa_remove_pf(struct pci_dev *pdev) 1060 + { 1061 + struct psnet *psnet = pci_get_drvdata(pdev); 1062 + 1063 + pci_disable_sriov(pdev); 1064 + /* If IRQs are allocated from the PF, we should free the IRQs */ 1065 + if (PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_IRQ_PF)) 1066 + pci_free_irq_vectors(pdev); 1067 + 1068 + snet_free_cfg(&psnet->cfg); 1069 + kfree(psnet); 1070 + } 1071 + 1072 + static void snet_vdpa_remove_vf(struct pci_dev *pdev) 1073 + { 1074 + struct snet *snet = pci_get_drvdata(pdev); 1075 + struct psnet *psnet = snet->psnet; 1076 + 1077 + vdpa_unregister_device(&snet->vdpa); 1078 + snet_free_vqs(snet); 1079 + /* If IRQs are allocated from the VF, we should free the IRQs */ 1080 + if (!PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_IRQ_PF)) 1081 + pci_free_irq_vectors(pdev); 1082 + } 1083 + 1084 + static void snet_vdpa_remove(struct pci_dev *pdev) 1085 + { 1086 + if (pdev->is_virtfn) 1087 + snet_vdpa_remove_vf(pdev); 1088 + else 1089 + snet_vdpa_remove_pf(pdev); 1090 + } 1091 + 1092 + static struct pci_device_id snet_driver_pci_ids[] = { 1093 + { PCI_DEVICE_SUB(PCI_VENDOR_ID_SOLIDRUN, SNET_DEVICE_ID, 1094 + PCI_VENDOR_ID_SOLIDRUN, SNET_DEVICE_ID) }, 1095 + { 0 }, 1096 + }; 1097 + 1098 + MODULE_DEVICE_TABLE(pci, snet_driver_pci_ids); 1099 + 1100 + static struct pci_driver snet_vdpa_driver = { 1101 + .name = "snet-vdpa-driver", 1102 + .id_table = snet_driver_pci_ids, 1103 + .probe = snet_vdpa_probe, 1104 + .remove = snet_vdpa_remove, 1105 + }; 1106 + 1107 + module_pci_driver(snet_vdpa_driver); 1108 + 1109 + MODULE_AUTHOR("Alvaro Karsz <alvaro.karsz@solid-run.com>"); 1110 + MODULE_DESCRIPTION("SolidRun vDPA driver"); 1111 + MODULE_LICENSE("GPL v2");

+194

drivers/vdpa/solidrun/snet_vdpa.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * SolidRun DPU driver for control plane 4 + * 5 + * Copyright (C) 2022 SolidRun 6 + * 7 + * Author: Alvaro Karsz <alvaro.karsz@solid-run.com> 8 + * 9 + */ 10 + #ifndef _SNET_VDPA_H_ 11 + #define _SNET_VDPA_H_ 12 + 13 + #include <linux/vdpa.h> 14 + #include <linux/pci.h> 15 + 16 + #define SNET_NAME_SIZE 256 17 + 18 + #define SNET_ERR(pdev, fmt, ...) dev_err(&(pdev)->dev, "%s"fmt, "snet_vdpa: ", ##__VA_ARGS__) 19 + #define SNET_WARN(pdev, fmt, ...) dev_warn(&(pdev)->dev, "%s"fmt, "snet_vdpa: ", ##__VA_ARGS__) 20 + #define SNET_INFO(pdev, fmt, ...) dev_info(&(pdev)->dev, "%s"fmt, "snet_vdpa: ", ##__VA_ARGS__) 21 + #define SNET_DBG(pdev, fmt, ...) dev_dbg(&(pdev)->dev, "%s"fmt, "snet_vdpa: ", ##__VA_ARGS__) 22 + #define SNET_HAS_FEATURE(s, f) ((s)->negotiated_features & BIT_ULL(f)) 23 + /* VQ struct */ 24 + struct snet_vq { 25 + /* VQ callback */ 26 + struct vdpa_callback cb; 27 + /* desc base address */ 28 + u64 desc_area; 29 + /* device base address */ 30 + u64 device_area; 31 + /* driver base address */ 32 + u64 driver_area; 33 + /* Queue size */ 34 + u32 num; 35 + /* Serial ID for VQ */ 36 + u32 sid; 37 + /* is ready flag */ 38 + bool ready; 39 + /* IRQ number */ 40 + u32 irq; 41 + /* IRQ index, DPU uses this to parse data from MSI-X table */ 42 + u32 irq_idx; 43 + /* IRQ name */ 44 + char irq_name[SNET_NAME_SIZE]; 45 + /* pointer to mapped PCI BAR register used by this VQ to kick */ 46 + void __iomem *kick_ptr; 47 + }; 48 + 49 + struct snet { 50 + /* vdpa device */ 51 + struct vdpa_device vdpa; 52 + /* Config callback */ 53 + struct vdpa_callback cb; 54 + /* array of virqueues */ 55 + struct snet_vq **vqs; 56 + /* Used features */ 57 + u64 negotiated_features; 58 + /* Device serial ID */ 59 + u32 sid; 60 + /* device status */ 61 + u8 status; 62 + /* boolean indicating if snet config was passed to the device */ 63 + bool dpu_ready; 64 + /* IRQ number */ 65 + u32 cfg_irq; 66 + /* IRQ index, DPU uses this to parse data from MSI-X table */ 67 + u32 cfg_irq_idx; 68 + /* IRQ name */ 69 + char cfg_irq_name[SNET_NAME_SIZE]; 70 + /* BAR to access the VF */ 71 + void __iomem *bar; 72 + /* PCI device */ 73 + struct pci_dev *pdev; 74 + /* Pointer to snet pdev parent device */ 75 + struct psnet *psnet; 76 + /* Pointer to snet config device */ 77 + struct snet_dev_cfg *cfg; 78 + }; 79 + 80 + struct snet_dev_cfg { 81 + /* Device ID following VirtIO spec. */ 82 + u32 virtio_id; 83 + /* Number of VQs for this device */ 84 + u32 vq_num; 85 + /* Size of every VQ */ 86 + u32 vq_size; 87 + /* Virtual Function id */ 88 + u32 vfid; 89 + /* Device features, following VirtIO spec */ 90 + u64 features; 91 + /* Reserved for future usage */ 92 + u32 rsvd[6]; 93 + /* VirtIO device specific config size */ 94 + u32 cfg_size; 95 + /* VirtIO device specific config address */ 96 + void __iomem *virtio_cfg; 97 + } __packed; 98 + 99 + struct snet_cfg { 100 + /* Magic key */ 101 + u32 key; 102 + /* Size of total config in bytes */ 103 + u32 cfg_size; 104 + /* Config version */ 105 + u32 cfg_ver; 106 + /* Number of Virtual Functions to create */ 107 + u32 vf_num; 108 + /* BAR to use for the VFs */ 109 + u32 vf_bar; 110 + /* Where should we write the SNET's config */ 111 + u32 host_cfg_off; 112 + /* Max. allowed size for a SNET's config */ 113 + u32 max_size_host_cfg; 114 + /* VirtIO config offset in BAR */ 115 + u32 virtio_cfg_off; 116 + /* Offset in PCI BAR for VQ kicks */ 117 + u32 kick_off; 118 + /* Offset in PCI BAR for HW monitoring */ 119 + u32 hwmon_off; 120 + /* Offset in PCI BAR for SNET messages */ 121 + u32 msg_off; 122 + /* Config general flags - enum snet_cfg_flags */ 123 + u32 flags; 124 + /* Reserved for future usage */ 125 + u32 rsvd[6]; 126 + /* Number of snet devices */ 127 + u32 devices_num; 128 + /* The actual devices */ 129 + struct snet_dev_cfg **devs; 130 + } __packed; 131 + 132 + /* SolidNET PCIe device, one device per PCIe physical function */ 133 + struct psnet { 134 + /* PCI BARs */ 135 + void __iomem *bars[PCI_STD_NUM_BARS]; 136 + /* Negotiated config version */ 137 + u32 negotiated_cfg_ver; 138 + /* Next IRQ index to use in case when the IRQs are allocated from this device */ 139 + u32 next_irq; 140 + /* BAR number used to communicate with the device */ 141 + u8 barno; 142 + /* spinlock to protect data that can be changed by SNET devices */ 143 + spinlock_t lock; 144 + /* Pointer to the device's config read from BAR */ 145 + struct snet_cfg cfg; 146 + /* Name of monitor device */ 147 + char hwmon_name[SNET_NAME_SIZE]; 148 + }; 149 + 150 + enum snet_cfg_flags { 151 + /* Create a HWMON device */ 152 + SNET_CFG_FLAG_HWMON = BIT(0), 153 + /* USE IRQs from the physical function */ 154 + SNET_CFG_FLAG_IRQ_PF = BIT(1), 155 + }; 156 + 157 + #define PSNET_FLAG_ON(p, f) ((p)->cfg.flags & (f)) 158 + 159 + static inline u32 psnet_read32(struct psnet *psnet, u32 off) 160 + { 161 + return ioread32(psnet->bars[psnet->barno] + off); 162 + } 163 + 164 + static inline u32 snet_read32(struct snet *snet, u32 off) 165 + { 166 + return ioread32(snet->bar + off); 167 + } 168 + 169 + static inline void snet_write32(struct snet *snet, u32 off, u32 val) 170 + { 171 + iowrite32(val, snet->bar + off); 172 + } 173 + 174 + static inline u64 psnet_read64(struct psnet *psnet, u32 off) 175 + { 176 + u64 val; 177 + /* 64bits are written in 2 halves, low part first */ 178 + val = (u64)psnet_read32(psnet, off); 179 + val |= ((u64)psnet_read32(psnet, off + 4) << 32); 180 + return val; 181 + } 182 + 183 + static inline void snet_write64(struct snet *snet, u32 off, u64 val) 184 + { 185 + /* The DPU expects a 64bit integer in 2 halves, the low part first */ 186 + snet_write32(snet, off, (u32)val); 187 + snet_write32(snet, off + 4, (u32)(val >> 32)); 188 + } 189 + 190 + #if IS_ENABLED(CONFIG_HWMON) 191 + void psnet_create_hwmon(struct pci_dev *pdev); 192 + #endif 193 + 194 + #endif //_SNET_VDPA_H_

+93 -17

drivers/vdpa/vdpa.c

··· 39 39 u32 max_num, min_num = 1; 40 40 int ret = 0; 41 41 42 + d->dma_mask = &d->coherent_dma_mask; 43 + ret = dma_set_mask_and_coherent(d, DMA_BIT_MASK(64)); 44 + if (ret) 45 + return ret; 46 + 42 47 max_num = ops->get_vq_num_max(vdev); 43 48 if (ops->get_vq_num_min) 44 49 min_num = ops->get_vq_num_min(vdev); ··· 465 460 return 0; 466 461 } 467 462 463 + static u64 vdpa_mgmtdev_get_classes(const struct vdpa_mgmt_dev *mdev, 464 + unsigned int *nclasses) 465 + { 466 + u64 supported_classes = 0; 467 + unsigned int n = 0; 468 + 469 + for (int i = 0; mdev->id_table[i].device; i++) { 470 + if (mdev->id_table[i].device > 63) 471 + continue; 472 + supported_classes |= BIT_ULL(mdev->id_table[i].device); 473 + n++; 474 + } 475 + if (nclasses) 476 + *nclasses = n; 477 + 478 + return supported_classes; 479 + } 480 + 468 481 static int vdpa_mgmtdev_fill(const struct vdpa_mgmt_dev *mdev, struct sk_buff *msg, 469 482 u32 portid, u32 seq, int flags) 470 483 { 471 - u64 supported_classes = 0; 472 484 void *hdr; 473 - int i = 0; 474 485 int err; 475 486 476 487 hdr = genlmsg_put(msg, portid, seq, &vdpa_nl_family, flags, VDPA_CMD_MGMTDEV_NEW); ··· 496 475 if (err) 497 476 goto msg_err; 498 477 499 - while (mdev->id_table[i].device) { 500 - if (mdev->id_table[i].device <= 63) 501 - supported_classes |= BIT_ULL(mdev->id_table[i].device); 502 - i++; 503 - } 504 - 505 478 if (nla_put_u64_64bit(msg, VDPA_ATTR_MGMTDEV_SUPPORTED_CLASSES, 506 - supported_classes, VDPA_ATTR_UNSPEC)) { 479 + vdpa_mgmtdev_get_classes(mdev, NULL), 480 + VDPA_ATTR_UNSPEC)) { 507 481 err = -EMSGSIZE; 508 482 goto msg_err; 509 483 } ··· 582 566 BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU) | \ 583 567 BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) 584 568 569 + /* 570 + * Bitmask for all per-device features: feature bits VIRTIO_TRANSPORT_F_START 571 + * through VIRTIO_TRANSPORT_F_END are unset, i.e. 0xfffffc000fffffff for 572 + * all 64bit features. If the features are extended beyond 64 bits, or new 573 + * "holes" are reserved for other type of features than per-device, this 574 + * macro would have to be updated. 575 + */ 576 + #define VIRTIO_DEVICE_F_MASK (~0ULL << (VIRTIO_TRANSPORT_F_END + 1) | \ 577 + ((1ULL << VIRTIO_TRANSPORT_F_START) - 1)) 578 + 585 579 static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *info) 586 580 { 587 581 struct vdpa_dev_set_config config = {}; 588 582 struct nlattr **nl_attrs = info->attrs; 589 583 struct vdpa_mgmt_dev *mdev; 584 + unsigned int ncls = 0; 590 585 const u8 *macaddr; 591 586 const char *name; 587 + u64 classes; 592 588 int err = 0; 593 589 594 590 if (!info->attrs[VDPA_ATTR_DEV_NAME]) ··· 629 601 config.mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP); 630 602 } 631 603 if (nl_attrs[VDPA_ATTR_DEV_FEATURES]) { 604 + u64 missing = 0x0ULL; 605 + 632 606 config.device_features = 633 607 nla_get_u64(nl_attrs[VDPA_ATTR_DEV_FEATURES]); 608 + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MACADDR] && 609 + !(config.device_features & BIT_ULL(VIRTIO_NET_F_MAC))) 610 + missing |= BIT_ULL(VIRTIO_NET_F_MAC); 611 + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MTU] && 612 + !(config.device_features & BIT_ULL(VIRTIO_NET_F_MTU))) 613 + missing |= BIT_ULL(VIRTIO_NET_F_MTU); 614 + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP] && 615 + config.net.max_vq_pairs > 1 && 616 + !(config.device_features & BIT_ULL(VIRTIO_NET_F_MQ))) 617 + missing |= BIT_ULL(VIRTIO_NET_F_MQ); 618 + if (missing) { 619 + NL_SET_ERR_MSG_FMT_MOD(info->extack, 620 + "Missing features 0x%llx for provided attributes", 621 + missing); 622 + return -EINVAL; 623 + } 634 624 config.mask |= BIT_ULL(VDPA_ATTR_DEV_FEATURES); 635 625 } 636 626 ··· 668 622 err = PTR_ERR(mdev); 669 623 goto err; 670 624 } 625 + 671 626 if ((config.mask & mdev->config_attr_mask) != config.mask) { 672 - NL_SET_ERR_MSG_MOD(info->extack, 673 - "All provided attributes are not supported"); 627 + NL_SET_ERR_MSG_FMT_MOD(info->extack, 628 + "Some provided attributes are not supported: 0x%llx", 629 + config.mask & ~mdev->config_attr_mask); 674 630 err = -EOPNOTSUPP; 631 + goto err; 632 + } 633 + 634 + classes = vdpa_mgmtdev_get_classes(mdev, &ncls); 635 + if (config.mask & VDPA_DEV_NET_ATTRS_MASK && 636 + !(classes & BIT_ULL(VIRTIO_ID_NET))) { 637 + NL_SET_ERR_MSG_MOD(info->extack, 638 + "Network class attributes provided on unsupported management device"); 639 + err = -EINVAL; 640 + goto err; 641 + } 642 + if (!(config.mask & VDPA_DEV_NET_ATTRS_MASK) && 643 + config.mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES) && 644 + classes & BIT_ULL(VIRTIO_ID_NET) && ncls > 1 && 645 + config.device_features & VIRTIO_DEVICE_F_MASK) { 646 + NL_SET_ERR_MSG_MOD(info->extack, 647 + "Management device supports multi-class while device features specified are ambiguous"); 648 + err = -EINVAL; 675 649 goto err; 676 650 } 677 651 ··· 907 841 sizeof(config->mac), config->mac); 908 842 } 909 843 844 + static int vdpa_dev_net_status_config_fill(struct sk_buff *msg, u64 features, 845 + const struct virtio_net_config *config) 846 + { 847 + u16 val_u16; 848 + 849 + if ((features & BIT_ULL(VIRTIO_NET_F_STATUS)) == 0) 850 + return 0; 851 + 852 + val_u16 = __virtio16_to_cpu(true, config->status); 853 + return nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16); 854 + } 855 + 910 856 static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *msg) 911 857 { 912 858 struct virtio_net_config config = {}; 913 859 u64 features_device; 914 - u16 val_u16; 915 860 916 861 vdev->config->get_config(vdev, 0, &config, sizeof(config)); 917 - 918 - val_u16 = __virtio16_to_cpu(true, config.status); 919 - if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16)) 920 - return -EMSGSIZE; 921 862 922 863 features_device = vdev->config->get_device_features(vdev); 923 864 ··· 936 863 return -EMSGSIZE; 937 864 938 865 if (vdpa_dev_net_mac_config_fill(msg, features_device, &config)) 866 + return -EMSGSIZE; 867 + 868 + if (vdpa_dev_net_status_config_fill(msg, features_device, &config)) 939 869 return -EMSGSIZE; 940 870 941 871 return vdpa_dev_net_mq_config_fill(msg, features_device, &config); ··· 1087 1011 switch (device_id) { 1088 1012 case VIRTIO_ID_NET: 1089 1013 if (index > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX) { 1090 - NL_SET_ERR_MSG_MOD(info->extack, "queue index excceeds max value"); 1014 + NL_SET_ERR_MSG_MOD(info->extack, "queue index exceeds max value"); 1091 1015 err = -ERANGE; 1092 1016 break; 1093 1017 }

+78 -155

drivers/vdpa/vdpa_sim/vdpa_sim.c

··· 17 17 #include <linux/vringh.h> 18 18 #include <linux/vdpa.h> 19 19 #include <linux/vhost_iotlb.h> 20 - #include <linux/iova.h> 21 20 #include <uapi/linux/vdpa.h> 22 21 23 22 #include "vdpa_sim.h" ··· 44 45 return container_of(vdpa, struct vdpasim, vdpa); 45 46 } 46 47 47 - static struct vdpasim *dev_to_sim(struct device *dev) 48 - { 49 - struct vdpa_device *vdpa = dev_to_vdpa(dev); 50 - 51 - return vdpa_to_sim(vdpa); 52 - } 53 - 54 48 static void vdpasim_vq_notify(struct vringh *vring) 55 49 { 56 50 struct vdpasim_virtqueue *vq = ··· 58 66 static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx) 59 67 { 60 68 struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; 69 + uint16_t last_avail_idx = vq->vring.last_avail_idx; 61 70 62 - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, false, 71 + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, 63 72 (struct vring_desc *)(uintptr_t)vq->desc_addr, 64 73 (struct vring_avail *) 65 74 (uintptr_t)vq->driver_addr, 66 75 (struct vring_used *) 67 76 (uintptr_t)vq->device_addr); 68 77 78 + vq->vring.last_avail_idx = last_avail_idx; 69 79 vq->vring.notify = vdpasim_vq_notify; 70 80 } 71 81 ··· 98 104 &vdpasim->iommu_lock); 99 105 } 100 106 101 - for (i = 0; i < vdpasim->dev_attr.nas; i++) 107 + for (i = 0; i < vdpasim->dev_attr.nas; i++) { 102 108 vhost_iotlb_reset(&vdpasim->iommu[i]); 109 + vhost_iotlb_add_range(&vdpasim->iommu[i], 0, ULONG_MAX, 110 + 0, VHOST_MAP_RW); 111 + vdpasim->iommu_pt[i] = true; 112 + } 103 113 104 114 vdpasim->running = true; 105 115 spin_unlock(&vdpasim->iommu_lock); ··· 113 115 ++vdpasim->generation; 114 116 } 115 117 116 - static int dir_to_perm(enum dma_data_direction dir) 117 - { 118 - int perm = -EFAULT; 119 - 120 - switch (dir) { 121 - case DMA_FROM_DEVICE: 122 - perm = VHOST_MAP_WO; 123 - break; 124 - case DMA_TO_DEVICE: 125 - perm = VHOST_MAP_RO; 126 - break; 127 - case DMA_BIDIRECTIONAL: 128 - perm = VHOST_MAP_RW; 129 - break; 130 - default: 131 - break; 132 - } 133 - 134 - return perm; 135 - } 136 - 137 - static dma_addr_t vdpasim_map_range(struct vdpasim *vdpasim, phys_addr_t paddr, 138 - size_t size, unsigned int perm) 139 - { 140 - struct iova *iova; 141 - dma_addr_t dma_addr; 142 - int ret; 143 - 144 - /* We set the limit_pfn to the maximum (ULONG_MAX - 1) */ 145 - iova = alloc_iova(&vdpasim->iova, size >> iova_shift(&vdpasim->iova), 146 - ULONG_MAX - 1, true); 147 - if (!iova) 148 - return DMA_MAPPING_ERROR; 149 - 150 - dma_addr = iova_dma_addr(&vdpasim->iova, iova); 151 - 152 - spin_lock(&vdpasim->iommu_lock); 153 - ret = vhost_iotlb_add_range(&vdpasim->iommu[0], (u64)dma_addr, 154 - (u64)dma_addr + size - 1, (u64)paddr, perm); 155 - spin_unlock(&vdpasim->iommu_lock); 156 - 157 - if (ret) { 158 - __free_iova(&vdpasim->iova, iova); 159 - return DMA_MAPPING_ERROR; 160 - } 161 - 162 - return dma_addr; 163 - } 164 - 165 - static void vdpasim_unmap_range(struct vdpasim *vdpasim, dma_addr_t dma_addr, 166 - size_t size) 167 - { 168 - spin_lock(&vdpasim->iommu_lock); 169 - vhost_iotlb_del_range(&vdpasim->iommu[0], (u64)dma_addr, 170 - (u64)dma_addr + size - 1); 171 - spin_unlock(&vdpasim->iommu_lock); 172 - 173 - free_iova(&vdpasim->iova, iova_pfn(&vdpasim->iova, dma_addr)); 174 - } 175 - 176 - static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page, 177 - unsigned long offset, size_t size, 178 - enum dma_data_direction dir, 179 - unsigned long attrs) 180 - { 181 - struct vdpasim *vdpasim = dev_to_sim(dev); 182 - phys_addr_t paddr = page_to_phys(page) + offset; 183 - int perm = dir_to_perm(dir); 184 - 185 - if (perm < 0) 186 - return DMA_MAPPING_ERROR; 187 - 188 - return vdpasim_map_range(vdpasim, paddr, size, perm); 189 - } 190 - 191 - static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr, 192 - size_t size, enum dma_data_direction dir, 193 - unsigned long attrs) 194 - { 195 - struct vdpasim *vdpasim = dev_to_sim(dev); 196 - 197 - vdpasim_unmap_range(vdpasim, dma_addr, size); 198 - } 199 - 200 - static void *vdpasim_alloc_coherent(struct device *dev, size_t size, 201 - dma_addr_t *dma_addr, gfp_t flag, 202 - unsigned long attrs) 203 - { 204 - struct vdpasim *vdpasim = dev_to_sim(dev); 205 - phys_addr_t paddr; 206 - void *addr; 207 - 208 - addr = kmalloc(size, flag); 209 - if (!addr) { 210 - *dma_addr = DMA_MAPPING_ERROR; 211 - return NULL; 212 - } 213 - 214 - paddr = virt_to_phys(addr); 215 - 216 - *dma_addr = vdpasim_map_range(vdpasim, paddr, size, VHOST_MAP_RW); 217 - if (*dma_addr == DMA_MAPPING_ERROR) { 218 - kfree(addr); 219 - return NULL; 220 - } 221 - 222 - return addr; 223 - } 224 - 225 - static void vdpasim_free_coherent(struct device *dev, size_t size, 226 - void *vaddr, dma_addr_t dma_addr, 227 - unsigned long attrs) 228 - { 229 - struct vdpasim *vdpasim = dev_to_sim(dev); 230 - 231 - vdpasim_unmap_range(vdpasim, dma_addr, size); 232 - 233 - kfree(vaddr); 234 - } 235 - 236 - static const struct dma_map_ops vdpasim_dma_ops = { 237 - .map_page = vdpasim_map_page, 238 - .unmap_page = vdpasim_unmap_page, 239 - .alloc = vdpasim_alloc_coherent, 240 - .free = vdpasim_free_coherent, 241 - }; 242 - 243 118 static const struct vdpa_config_ops vdpasim_config_ops; 244 119 static const struct vdpa_config_ops vdpasim_batch_config_ops; 245 120 ··· 120 249 const struct vdpa_dev_set_config *config) 121 250 { 122 251 const struct vdpa_config_ops *ops; 252 + struct vdpa_device *vdpa; 123 253 struct vdpasim *vdpasim; 124 254 struct device *dev; 125 255 int i, ret = -ENOMEM; 256 + 257 + if (!dev_attr->alloc_size) 258 + return ERR_PTR(-EINVAL); 126 259 127 260 if (config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) { 128 261 if (config->device_features & ··· 141 266 else 142 267 ops = &vdpasim_config_ops; 143 268 144 - vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, 145 - dev_attr->ngroups, dev_attr->nas, 146 - dev_attr->name, false); 147 - if (IS_ERR(vdpasim)) { 148 - ret = PTR_ERR(vdpasim); 269 + vdpa = __vdpa_alloc_device(NULL, ops, 270 + dev_attr->ngroups, dev_attr->nas, 271 + dev_attr->alloc_size, 272 + dev_attr->name, false); 273 + if (IS_ERR(vdpa)) { 274 + ret = PTR_ERR(vdpa); 149 275 goto err_alloc; 150 276 } 151 277 278 + vdpasim = vdpa_to_sim(vdpa); 152 279 vdpasim->dev_attr = *dev_attr; 153 280 INIT_WORK(&vdpasim->work, dev_attr->work_fn); 154 281 spin_lock_init(&vdpasim->lock); ··· 160 283 dev->dma_mask = &dev->coherent_dma_mask; 161 284 if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64))) 162 285 goto err_iommu; 163 - set_dma_ops(dev, &vdpasim_dma_ops); 164 286 vdpasim->vdpa.mdev = dev_attr->mgmt_dev; 165 287 166 288 vdpasim->config = kzalloc(dev_attr->config_size, GFP_KERNEL); ··· 176 300 if (!vdpasim->iommu) 177 301 goto err_iommu; 178 302 303 + vdpasim->iommu_pt = kmalloc_array(vdpasim->dev_attr.nas, 304 + sizeof(*vdpasim->iommu_pt), GFP_KERNEL); 305 + if (!vdpasim->iommu_pt) 306 + goto err_iommu; 307 + 179 308 for (i = 0; i < vdpasim->dev_attr.nas; i++) 180 309 vhost_iotlb_init(&vdpasim->iommu[i], max_iotlb_entries, 0); 181 310 ··· 191 310 for (i = 0; i < dev_attr->nvqs; i++) 192 311 vringh_set_iotlb(&vdpasim->vqs[i].vring, &vdpasim->iommu[0], 193 312 &vdpasim->iommu_lock); 194 - 195 - ret = iova_cache_get(); 196 - if (ret) 197 - goto err_iommu; 198 - 199 - /* For simplicity we use an IOVA allocator with byte granularity */ 200 - init_iova_domain(&vdpasim->iova, 1, 0); 201 313 202 314 vdpasim->vdpa.dma_dev = dev; 203 315 ··· 229 355 { 230 356 struct vdpasim *vdpasim = vdpa_to_sim(vdpa); 231 357 struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; 358 + 359 + if (!vdpasim->running && 360 + (vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK)) { 361 + vdpasim->pending_kick = true; 362 + return; 363 + } 232 364 233 365 if (vq->ready) 234 366 schedule_work(&vdpasim->work); ··· 296 416 297 417 state->split.avail_index = vrh->last_avail_idx; 298 418 return 0; 419 + } 420 + 421 + static int vdpasim_get_vq_stats(struct vdpa_device *vdpa, u16 idx, 422 + struct sk_buff *msg, 423 + struct netlink_ext_ack *extack) 424 + { 425 + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); 426 + 427 + if (vdpasim->dev_attr.get_stats) 428 + return vdpasim->dev_attr.get_stats(vdpasim, idx, 429 + msg, extack); 430 + return -EOPNOTSUPP; 299 431 } 300 432 301 433 static u32 vdpasim_get_vq_align(struct vdpa_device *vdpa) ··· 418 526 return 0; 419 527 } 420 528 529 + static int vdpasim_resume(struct vdpa_device *vdpa) 530 + { 531 + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); 532 + int i; 533 + 534 + spin_lock(&vdpasim->lock); 535 + vdpasim->running = true; 536 + 537 + if (vdpasim->pending_kick) { 538 + /* Process pending descriptors */ 539 + for (i = 0; i < vdpasim->dev_attr.nvqs; ++i) 540 + vdpasim_kick_vq(vdpa, i); 541 + 542 + vdpasim->pending_kick = false; 543 + } 544 + 545 + spin_unlock(&vdpasim->lock); 546 + 547 + return 0; 548 + } 549 + 421 550 static size_t vdpasim_get_config_size(struct vdpa_device *vdpa) 422 551 { 423 552 struct vdpasim *vdpasim = vdpa_to_sim(vdpa); ··· 534 621 535 622 iommu = &vdpasim->iommu[asid]; 536 623 vhost_iotlb_reset(iommu); 624 + vdpasim->iommu_pt[asid] = false; 537 625 538 626 for (map = vhost_iotlb_itree_first(iotlb, start, last); map; 539 627 map = vhost_iotlb_itree_next(map, start, last)) { ··· 563 649 return -EINVAL; 564 650 565 651 spin_lock(&vdpasim->iommu_lock); 652 + if (vdpasim->iommu_pt[asid]) { 653 + vhost_iotlb_reset(&vdpasim->iommu[asid]); 654 + vdpasim->iommu_pt[asid] = false; 655 + } 566 656 ret = vhost_iotlb_add_range_ctx(&vdpasim->iommu[asid], iova, 567 657 iova + size - 1, pa, perm, opaque); 568 658 spin_unlock(&vdpasim->iommu_lock); ··· 581 663 582 664 if (asid >= vdpasim->dev_attr.nas) 583 665 return -EINVAL; 666 + 667 + if (vdpasim->iommu_pt[asid]) { 668 + vhost_iotlb_reset(&vdpasim->iommu[asid]); 669 + vdpasim->iommu_pt[asid] = false; 670 + } 584 671 585 672 spin_lock(&vdpasim->iommu_lock); 586 673 vhost_iotlb_del_range(&vdpasim->iommu[asid], iova, iova + size - 1); ··· 606 683 vringh_kiov_cleanup(&vdpasim->vqs[i].in_iov); 607 684 } 608 685 609 - if (vdpa_get_dma_dev(vdpa)) { 610 - put_iova_domain(&vdpasim->iova); 611 - iova_cache_put(); 612 - } 613 - 614 686 kvfree(vdpasim->buffer); 615 687 for (i = 0; i < vdpasim->dev_attr.nas; i++) 616 688 vhost_iotlb_reset(&vdpasim->iommu[i]); 617 689 kfree(vdpasim->iommu); 690 + kfree(vdpasim->iommu_pt); 618 691 kfree(vdpasim->vqs); 619 692 kfree(vdpasim->config); 620 693 } ··· 623 704 .set_vq_ready = vdpasim_set_vq_ready, 624 705 .get_vq_ready = vdpasim_get_vq_ready, 625 706 .set_vq_state = vdpasim_set_vq_state, 707 + .get_vendor_vq_stats = vdpasim_get_vq_stats, 626 708 .get_vq_state = vdpasim_get_vq_state, 627 709 .get_vq_align = vdpasim_get_vq_align, 628 710 .get_vq_group = vdpasim_get_vq_group, ··· 638 718 .set_status = vdpasim_set_status, 639 719 .reset = vdpasim_reset, 640 720 .suspend = vdpasim_suspend, 721 + .resume = vdpasim_resume, 641 722 .get_config_size = vdpasim_get_config_size, 642 723 .get_config = vdpasim_get_config, 643 724 .set_config = vdpasim_set_config, ··· 658 737 .set_vq_ready = vdpasim_set_vq_ready, 659 738 .get_vq_ready = vdpasim_get_vq_ready, 660 739 .set_vq_state = vdpasim_set_vq_state, 740 + .get_vendor_vq_stats = vdpasim_get_vq_stats, 661 741 .get_vq_state = vdpasim_get_vq_state, 662 742 .get_vq_align = vdpasim_get_vq_align, 663 743 .get_vq_group = vdpasim_get_vq_group, ··· 673 751 .set_status = vdpasim_set_status, 674 752 .reset = vdpasim_reset, 675 753 .suspend = vdpasim_suspend, 754 + .resume = vdpasim_resume, 676 755 .get_config_size = vdpasim_get_config_size, 677 756 .get_config = vdpasim_get_config, 678 757 .set_config = vdpasim_set_config,

+6 -1

drivers/vdpa/vdpa_sim/vdpa_sim.h

··· 37 37 struct vdpa_mgmt_dev *mgmt_dev; 38 38 const char *name; 39 39 u64 supported_features; 40 + size_t alloc_size; 40 41 size_t config_size; 41 42 size_t buffer_size; 42 43 int nvqs; ··· 48 47 work_func_t work_fn; 49 48 void (*get_config)(struct vdpasim *vdpasim, void *config); 50 49 void (*set_config)(struct vdpasim *vdpasim, const void *config); 50 + int (*get_stats)(struct vdpasim *vdpasim, u16 idx, 51 + struct sk_buff *msg, 52 + struct netlink_ext_ack *extack); 51 53 }; 52 54 53 55 /* State of each vdpasim device */ ··· 64 60 /* virtio config according to device type */ 65 61 void *config; 66 62 struct vhost_iotlb *iommu; 67 - struct iova_domain iova; 63 + bool *iommu_pt; 68 64 void *buffer; 69 65 u32 status; 70 66 u32 generation; 71 67 u64 features; 72 68 u32 groups; 73 69 bool running; 70 + bool pending_kick; 74 71 /* spinlock to synchronize iommu table */ 75 72 spinlock_t iommu_lock; 76 73 };

+1

drivers/vdpa/vdpa_sim/vdpa_sim_blk.c

··· 378 378 dev_attr.nvqs = VDPASIM_BLK_VQ_NUM; 379 379 dev_attr.ngroups = VDPASIM_BLK_GROUP_NUM; 380 380 dev_attr.nas = VDPASIM_BLK_AS_NUM; 381 + dev_attr.alloc_size = sizeof(struct vdpasim); 381 382 dev_attr.config_size = sizeof(struct virtio_blk_config); 382 383 dev_attr.get_config = vdpasim_blk_get_config; 383 384 dev_attr.work_fn = vdpasim_blk_work;

+214 -5

drivers/vdpa/vdpa_sim/vdpa_sim_net.c

··· 15 15 #include <linux/etherdevice.h> 16 16 #include <linux/vringh.h> 17 17 #include <linux/vdpa.h> 18 + #include <net/netlink.h> 18 19 #include <uapi/linux/virtio_net.h> 19 20 #include <uapi/linux/vdpa.h> 20 21 ··· 28 27 29 28 #define VDPASIM_NET_FEATURES (VDPASIM_FEATURES | \ 30 29 (1ULL << VIRTIO_NET_F_MAC) | \ 30 + (1ULL << VIRTIO_NET_F_STATUS) | \ 31 31 (1ULL << VIRTIO_NET_F_MTU) | \ 32 32 (1ULL << VIRTIO_NET_F_CTRL_VQ) | \ 33 33 (1ULL << VIRTIO_NET_F_CTRL_MAC_ADDR)) ··· 37 35 #define VDPASIM_NET_VQ_NUM 3 38 36 #define VDPASIM_NET_AS_NUM 2 39 37 #define VDPASIM_NET_GROUP_NUM 2 38 + 39 + struct vdpasim_dataq_stats { 40 + struct u64_stats_sync syncp; 41 + u64 pkts; 42 + u64 bytes; 43 + u64 drops; 44 + u64 errors; 45 + u64 overruns; 46 + }; 47 + 48 + struct vdpasim_cq_stats { 49 + struct u64_stats_sync syncp; 50 + u64 requests; 51 + u64 successes; 52 + u64 errors; 53 + }; 54 + 55 + struct vdpasim_net{ 56 + struct vdpasim vdpasim; 57 + struct vdpasim_dataq_stats tx_stats; 58 + struct vdpasim_dataq_stats rx_stats; 59 + struct vdpasim_cq_stats cq_stats; 60 + }; 61 + 62 + static struct vdpasim_net *sim_to_net(struct vdpasim *vdpasim) 63 + { 64 + return container_of(vdpasim, struct vdpasim_net, vdpasim); 65 + } 40 66 41 67 static void vdpasim_net_complete(struct vdpasim_virtqueue *vq, size_t len) 42 68 { ··· 126 96 static void vdpasim_handle_cvq(struct vdpasim *vdpasim) 127 97 { 128 98 struct vdpasim_virtqueue *cvq = &vdpasim->vqs[2]; 99 + struct vdpasim_net *net = sim_to_net(vdpasim); 129 100 virtio_net_ctrl_ack status = VIRTIO_NET_ERR; 130 101 struct virtio_net_ctrl_hdr ctrl; 131 102 size_t read, write; 103 + u64 requests = 0, errors = 0, successes = 0; 132 104 int err; 133 105 134 106 if (!(vdpasim->features & (1ULL << VIRTIO_NET_F_CTRL_VQ))) ··· 146 114 if (err <= 0) 147 115 break; 148 116 117 + ++requests; 149 118 read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->in_iov, &ctrl, 150 119 sizeof(ctrl)); 151 - if (read != sizeof(ctrl)) 120 + if (read != sizeof(ctrl)) { 121 + ++errors; 152 122 break; 123 + } 153 124 154 125 switch (ctrl.class) { 155 126 case VIRTIO_NET_CTRL_MAC: ··· 161 126 default: 162 127 break; 163 128 } 129 + 130 + if (status == VIRTIO_NET_OK) 131 + ++successes; 132 + else 133 + ++errors; 164 134 165 135 /* Make sure data is wrote before advancing index */ 166 136 smp_wmb(); ··· 184 144 cvq->cb(cvq->private); 185 145 local_bh_enable(); 186 146 } 147 + 148 + u64_stats_update_begin(&net->cq_stats.syncp); 149 + net->cq_stats.requests += requests; 150 + net->cq_stats.errors += errors; 151 + net->cq_stats.successes += successes; 152 + u64_stats_update_end(&net->cq_stats.syncp); 187 153 } 188 154 189 155 static void vdpasim_net_work(struct work_struct *work) ··· 197 151 struct vdpasim *vdpasim = container_of(work, struct vdpasim, work); 198 152 struct vdpasim_virtqueue *txq = &vdpasim->vqs[1]; 199 153 struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0]; 154 + struct vdpasim_net *net = sim_to_net(vdpasim); 200 155 ssize_t read, write; 201 - int pkts = 0; 156 + u64 tx_pkts = 0, rx_pkts = 0, tx_bytes = 0, rx_bytes = 0; 157 + u64 rx_drops = 0, rx_overruns = 0, rx_errors = 0, tx_errors = 0; 202 158 int err; 203 159 204 160 spin_lock(&vdpasim->lock); ··· 219 171 while (true) { 220 172 err = vringh_getdesc_iotlb(&txq->vring, &txq->out_iov, NULL, 221 173 &txq->head, GFP_ATOMIC); 222 - if (err <= 0) 174 + if (err <= 0) { 175 + if (err) 176 + ++tx_errors; 223 177 break; 178 + } 224 179 180 + ++tx_pkts; 225 181 read = vringh_iov_pull_iotlb(&txq->vring, &txq->out_iov, 226 182 vdpasim->buffer, 227 183 PAGE_SIZE); 228 184 185 + tx_bytes += read; 186 + 229 187 if (!receive_filter(vdpasim, read)) { 188 + ++rx_drops; 230 189 vdpasim_net_complete(txq, 0); 231 190 continue; 232 191 } ··· 241 186 err = vringh_getdesc_iotlb(&rxq->vring, NULL, &rxq->in_iov, 242 187 &rxq->head, GFP_ATOMIC); 243 188 if (err <= 0) { 189 + ++rx_overruns; 244 190 vdpasim_net_complete(txq, 0); 245 191 break; 246 192 } 247 193 248 194 write = vringh_iov_push_iotlb(&rxq->vring, &rxq->in_iov, 249 195 vdpasim->buffer, read); 250 - if (write <= 0) 196 + if (write <= 0) { 197 + ++rx_errors; 251 198 break; 199 + } 200 + 201 + ++rx_pkts; 202 + rx_bytes += write; 252 203 253 204 vdpasim_net_complete(txq, 0); 254 205 vdpasim_net_complete(rxq, write); 255 206 256 - if (++pkts > 4) { 207 + if (tx_pkts > 4) { 257 208 schedule_work(&vdpasim->work); 258 209 goto out; 259 210 } ··· 267 206 268 207 out: 269 208 spin_unlock(&vdpasim->lock); 209 + 210 + u64_stats_update_begin(&net->tx_stats.syncp); 211 + net->tx_stats.pkts += tx_pkts; 212 + net->tx_stats.bytes += tx_bytes; 213 + net->tx_stats.errors += tx_errors; 214 + u64_stats_update_end(&net->tx_stats.syncp); 215 + 216 + u64_stats_update_begin(&net->rx_stats.syncp); 217 + net->rx_stats.pkts += rx_pkts; 218 + net->rx_stats.bytes += rx_bytes; 219 + net->rx_stats.drops += rx_drops; 220 + net->rx_stats.errors += rx_errors; 221 + net->rx_stats.overruns += rx_overruns; 222 + u64_stats_update_end(&net->rx_stats.syncp); 223 + } 224 + 225 + static int vdpasim_net_get_stats(struct vdpasim *vdpasim, u16 idx, 226 + struct sk_buff *msg, 227 + struct netlink_ext_ack *extack) 228 + { 229 + struct vdpasim_net *net = sim_to_net(vdpasim); 230 + u64 rx_pkts, rx_bytes, rx_errors, rx_overruns, rx_drops; 231 + u64 tx_pkts, tx_bytes, tx_errors, tx_drops; 232 + u64 cq_requests, cq_successes, cq_errors; 233 + unsigned int start; 234 + int err = -EMSGSIZE; 235 + 236 + switch(idx) { 237 + case 0: 238 + do { 239 + start = u64_stats_fetch_begin(&net->rx_stats.syncp); 240 + rx_pkts = net->rx_stats.pkts; 241 + rx_bytes = net->rx_stats.bytes; 242 + rx_errors = net->rx_stats.errors; 243 + rx_overruns = net->rx_stats.overruns; 244 + rx_drops = net->rx_stats.drops; 245 + } while (u64_stats_fetch_retry(&net->rx_stats.syncp, start)); 246 + 247 + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, 248 + "rx packets")) 249 + break; 250 + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, 251 + rx_pkts, VDPA_ATTR_PAD)) 252 + break; 253 + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, 254 + "rx bytes")) 255 + break; 256 + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, 257 + rx_bytes, VDPA_ATTR_PAD)) 258 + break; 259 + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, 260 + "rx errors")) 261 + break; 262 + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, 263 + rx_errors, VDPA_ATTR_PAD)) 264 + break; 265 + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, 266 + "rx overruns")) 267 + break; 268 + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, 269 + rx_overruns, VDPA_ATTR_PAD)) 270 + break; 271 + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, 272 + "rx drops")) 273 + break; 274 + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, 275 + rx_drops, VDPA_ATTR_PAD)) 276 + break; 277 + err = 0; 278 + break; 279 + case 1: 280 + do { 281 + start = u64_stats_fetch_begin(&net->tx_stats.syncp); 282 + tx_pkts = net->tx_stats.pkts; 283 + tx_bytes = net->tx_stats.bytes; 284 + tx_errors = net->tx_stats.errors; 285 + tx_drops = net->tx_stats.drops; 286 + } while (u64_stats_fetch_retry(&net->tx_stats.syncp, start)); 287 + 288 + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, 289 + "tx packets")) 290 + break; 291 + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, 292 + tx_pkts, VDPA_ATTR_PAD)) 293 + break; 294 + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, 295 + "tx bytes")) 296 + break; 297 + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, 298 + tx_bytes, VDPA_ATTR_PAD)) 299 + break; 300 + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, 301 + "tx errors")) 302 + break; 303 + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, 304 + tx_errors, VDPA_ATTR_PAD)) 305 + break; 306 + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, 307 + "tx drops")) 308 + break; 309 + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, 310 + tx_drops, VDPA_ATTR_PAD)) 311 + break; 312 + err = 0; 313 + break; 314 + case 2: 315 + do { 316 + start = u64_stats_fetch_begin(&net->cq_stats.syncp); 317 + cq_requests = net->cq_stats.requests; 318 + cq_successes = net->cq_stats.successes; 319 + cq_errors = net->cq_stats.errors; 320 + } while (u64_stats_fetch_retry(&net->cq_stats.syncp, start)); 321 + 322 + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, 323 + "cvq requests")) 324 + break; 325 + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, 326 + cq_requests, VDPA_ATTR_PAD)) 327 + break; 328 + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, 329 + "cvq successes")) 330 + break; 331 + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, 332 + cq_successes, VDPA_ATTR_PAD)) 333 + break; 334 + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, 335 + "cvq errors")) 336 + break; 337 + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, 338 + cq_errors, VDPA_ATTR_PAD)) 339 + break; 340 + err = 0; 341 + break; 342 + default: 343 + err = -EINVAL; 344 + break; 345 + } 346 + 347 + return err; 270 348 } 271 349 272 350 static void vdpasim_net_get_config(struct vdpasim *vdpasim, void *config) ··· 442 242 const struct vdpa_dev_set_config *config) 443 243 { 444 244 struct vdpasim_dev_attr dev_attr = {}; 245 + struct vdpasim_net *net; 445 246 struct vdpasim *simdev; 446 247 int ret; 447 248 ··· 453 252 dev_attr.nvqs = VDPASIM_NET_VQ_NUM; 454 253 dev_attr.ngroups = VDPASIM_NET_GROUP_NUM; 455 254 dev_attr.nas = VDPASIM_NET_AS_NUM; 255 + dev_attr.alloc_size = sizeof(struct vdpasim_net); 456 256 dev_attr.config_size = sizeof(struct virtio_net_config); 457 257 dev_attr.get_config = vdpasim_net_get_config; 458 258 dev_attr.work_fn = vdpasim_net_work; 259 + dev_attr.get_stats = vdpasim_net_get_stats; 459 260 dev_attr.buffer_size = PAGE_SIZE; 460 261 461 262 simdev = vdpasim_create(&dev_attr, config); ··· 469 266 ret = _vdpa_register_device(&simdev->vdpa, VDPASIM_NET_VQ_NUM); 470 267 if (ret) 471 268 goto reg_err; 269 + 270 + net = sim_to_net(simdev); 271 + 272 + u64_stats_init(&net->tx_stats.syncp); 273 + u64_stats_init(&net->rx_stats.syncp); 274 + u64_stats_init(&net->cq_stats.syncp); 472 275 473 276 return 0; 474 277

+3 -2

drivers/vhost/net.c

··· 73 73 VHOST_NET_FEATURES = VHOST_FEATURES | 74 74 (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | 75 75 (1ULL << VIRTIO_NET_F_MRG_RXBUF) | 76 - (1ULL << VIRTIO_F_ACCESS_PLATFORM) 76 + (1ULL << VIRTIO_F_ACCESS_PLATFORM) | 77 + (1ULL << VIRTIO_F_RING_RESET) 77 78 }; 78 79 79 80 enum { ··· 1646 1645 goto out_unlock; 1647 1646 1648 1647 if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) { 1649 - if (vhost_init_device_iotlb(&n->dev, true)) 1648 + if (vhost_init_device_iotlb(&n->dev)) 1650 1649 goto out_unlock; 1651 1650 } 1652 1651

+3 -3

drivers/vhost/scsi.c

··· 2105 2105 struct vhost_scsi_tpg *tpg = container_of(se_tpg, 2106 2106 struct vhost_scsi_tpg, se_tpg); 2107 2107 2108 - return sprintf(page, "%d\n", tpg->tv_fabric_prot_type); 2108 + return sysfs_emit(page, "%d\n", tpg->tv_fabric_prot_type); 2109 2109 } 2110 2110 2111 2111 CONFIGFS_ATTR(vhost_scsi_tpg_attrib_, fabric_prot_type); ··· 2215 2215 mutex_unlock(&tpg->tv_tpg_mutex); 2216 2216 return -ENODEV; 2217 2217 } 2218 - ret = snprintf(page, PAGE_SIZE, "%s\n", 2218 + ret = sysfs_emit(page, "%s\n", 2219 2219 tv_nexus->tvn_se_sess->se_node_acl->initiatorname); 2220 2220 mutex_unlock(&tpg->tv_tpg_mutex); 2221 2221 ··· 2440 2440 static ssize_t 2441 2441 vhost_scsi_wwn_version_show(struct config_item *item, char *page) 2442 2442 { 2443 - return sprintf(page, "TCM_VHOST fabric module %s on %s/%s" 2443 + return sysfs_emit(page, "TCM_VHOST fabric module %s on %s/%s" 2444 2444 "on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname, 2445 2445 utsname()->machine); 2446 2446 }

-3

drivers/vhost/test.c

··· 333 333 return -EFAULT; 334 334 return 0; 335 335 case VHOST_SET_FEATURES: 336 - printk(KERN_ERR "1\n"); 337 336 if (copy_from_user(&features, featurep, sizeof features)) 338 337 return -EFAULT; 339 - printk(KERN_ERR "2\n"); 340 338 if (features & ~VHOST_FEATURES) 341 339 return -EOPNOTSUPP; 342 - printk(KERN_ERR "3\n"); 343 340 return vhost_test_set_features(n, features); 344 341 case VHOST_RESET_OWNER: 345 342 return vhost_test_reset_owner(n);

+37 -2

drivers/vhost/vdpa.c

··· 359 359 return ops->suspend; 360 360 } 361 361 362 + static bool vhost_vdpa_can_resume(const struct vhost_vdpa *v) 363 + { 364 + struct vdpa_device *vdpa = v->vdpa; 365 + const struct vdpa_config_ops *ops = vdpa->config; 366 + 367 + return ops->resume; 368 + } 369 + 362 370 static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user *featurep) 363 371 { 364 372 struct vdpa_device *vdpa = v->vdpa; ··· 506 498 return ops->suspend(vdpa); 507 499 } 508 500 501 + /* After a successful return of this ioctl the device resumes processing 502 + * virtqueue descriptors. The device becomes fully operational the same way it 503 + * was before it was suspended. 504 + */ 505 + static long vhost_vdpa_resume(struct vhost_vdpa *v) 506 + { 507 + struct vdpa_device *vdpa = v->vdpa; 508 + const struct vdpa_config_ops *ops = vdpa->config; 509 + 510 + if (!ops->resume) 511 + return -EOPNOTSUPP; 512 + 513 + return ops->resume(vdpa); 514 + } 515 + 509 516 static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, 510 517 void __user *argp) 511 518 { ··· 629 606 if (copy_from_user(&features, featurep, sizeof(features))) 630 607 return -EFAULT; 631 608 if (features & ~(VHOST_VDPA_BACKEND_FEATURES | 632 - BIT_ULL(VHOST_BACKEND_F_SUSPEND))) 609 + BIT_ULL(VHOST_BACKEND_F_SUSPEND) | 610 + BIT_ULL(VHOST_BACKEND_F_RESUME))) 633 611 return -EOPNOTSUPP; 634 612 if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) && 635 613 !vhost_vdpa_can_suspend(v)) 614 + return -EOPNOTSUPP; 615 + if ((features & BIT_ULL(VHOST_BACKEND_F_RESUME)) && 616 + !vhost_vdpa_can_resume(v)) 636 617 return -EOPNOTSUPP; 637 618 vhost_set_backend_features(&v->vdev, features); 638 619 return 0; ··· 689 662 features = VHOST_VDPA_BACKEND_FEATURES; 690 663 if (vhost_vdpa_can_suspend(v)) 691 664 features |= BIT_ULL(VHOST_BACKEND_F_SUSPEND); 665 + if (vhost_vdpa_can_resume(v)) 666 + features |= BIT_ULL(VHOST_BACKEND_F_RESUME); 692 667 if (copy_to_user(featurep, &features, sizeof(features))) 693 668 r = -EFAULT; 694 669 break; ··· 705 676 break; 706 677 case VHOST_VDPA_SUSPEND: 707 678 r = vhost_vdpa_suspend(v); 679 + break; 680 + case VHOST_VDPA_RESUME: 681 + r = vhost_vdpa_resume(v); 708 682 break; 709 683 default: 710 684 r = vhost_dev_ioctl(&v->vdev, cmd, argp); ··· 1151 1119 if (!bus) 1152 1120 return -EFAULT; 1153 1121 1154 - if (!device_iommu_capable(dma_dev, IOMMU_CAP_CACHE_COHERENCY)) 1122 + if (!device_iommu_capable(dma_dev, IOMMU_CAP_CACHE_COHERENCY)) { 1123 + dev_warn_once(&v->dev, 1124 + "Failed to allocate domain, device is not IOMMU cache coherent capable\n"); 1155 1125 return -ENOTSUPP; 1126 + } 1156 1127 1157 1128 v->domain = iommu_domain_alloc(bus); 1158 1129 if (!v->domain)

+1 -1

drivers/vhost/vhost.c

··· 1730 1730 } 1731 1731 EXPORT_SYMBOL_GPL(vhost_vring_ioctl); 1732 1732 1733 - int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled) 1733 + int vhost_init_device_iotlb(struct vhost_dev *d) 1734 1734 { 1735 1735 struct vhost_iotlb *niotlb, *oiotlb; 1736 1736 int i;

+1 -1

drivers/vhost/vhost.h

··· 222 222 int noblock); 223 223 ssize_t vhost_chr_write_iter(struct vhost_dev *dev, 224 224 struct iov_iter *from); 225 - int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled); 225 + int vhost_init_device_iotlb(struct vhost_dev *d); 226 226 227 227 void vhost_iotlb_map_free(struct vhost_iotlb *iotlb, 228 228 struct vhost_iotlb_map *map);

+1 -1

drivers/vhost/vsock.c

··· 793 793 } 794 794 795 795 if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) { 796 - if (vhost_init_device_iotlb(&vsock->dev, true)) 796 + if (vhost_init_device_iotlb(&vsock->dev)) 797 797 goto err; 798 798 } 799 799

+93 -40

drivers/virtio/virtio_ring.c

··· 202 202 /* DMA, allocation, and size information */ 203 203 bool we_own_ring; 204 204 205 + /* Device used for doing DMA */ 206 + struct device *dma_dev; 207 + 205 208 #ifdef DEBUG 206 209 /* They're supposed to lock for us. */ 207 210 unsigned int in_use; ··· 222 219 bool context, 223 220 bool (*notify)(struct virtqueue *), 224 221 void (*callback)(struct virtqueue *), 225 - const char *name); 222 + const char *name, 223 + struct device *dma_dev); 226 224 static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num); 227 225 static void vring_free(struct virtqueue *_vq); 228 226 ··· 301 297 EXPORT_SYMBOL_GPL(virtio_max_dma_size); 302 298 303 299 static void *vring_alloc_queue(struct virtio_device *vdev, size_t size, 304 - dma_addr_t *dma_handle, gfp_t flag) 300 + dma_addr_t *dma_handle, gfp_t flag, 301 + struct device *dma_dev) 305 302 { 306 303 if (vring_use_dma_api(vdev)) { 307 - return dma_alloc_coherent(vdev->dev.parent, size, 304 + return dma_alloc_coherent(dma_dev, size, 308 305 dma_handle, flag); 309 306 } else { 310 307 void *queue = alloc_pages_exact(PAGE_ALIGN(size), flag); ··· 335 330 } 336 331 337 332 static void vring_free_queue(struct virtio_device *vdev, size_t size, 338 - void *queue, dma_addr_t dma_handle) 333 + void *queue, dma_addr_t dma_handle, 334 + struct device *dma_dev) 339 335 { 340 336 if (vring_use_dma_api(vdev)) 341 - dma_free_coherent(vdev->dev.parent, size, queue, dma_handle); 337 + dma_free_coherent(dma_dev, size, queue, dma_handle); 342 338 else 343 339 free_pages_exact(queue, PAGE_ALIGN(size)); 344 340 } ··· 347 341 /* 348 342 * The DMA ops on various arches are rather gnarly right now, and 349 343 * making all of the arch DMA ops work on the vring device itself 350 - * is a mess. For now, we use the parent device for DMA ops. 344 + * is a mess. 351 345 */ 352 346 static inline struct device *vring_dma_dev(const struct vring_virtqueue *vq) 353 347 { 354 - return vq->vq.vdev->dev.parent; 348 + return vq->dma_dev; 355 349 } 356 350 357 351 /* Map one sg entry. */ ··· 1038 1032 } 1039 1033 1040 1034 static void vring_free_split(struct vring_virtqueue_split *vring_split, 1041 - struct virtio_device *vdev) 1035 + struct virtio_device *vdev, struct device *dma_dev) 1042 1036 { 1043 1037 vring_free_queue(vdev, vring_split->queue_size_in_bytes, 1044 1038 vring_split->vring.desc, 1045 - vring_split->queue_dma_addr); 1039 + vring_split->queue_dma_addr, 1040 + dma_dev); 1046 1041 1047 1042 kfree(vring_split->desc_state); 1048 1043 kfree(vring_split->desc_extra); ··· 1053 1046 struct virtio_device *vdev, 1054 1047 u32 num, 1055 1048 unsigned int vring_align, 1056 - bool may_reduce_num) 1049 + bool may_reduce_num, 1050 + struct device *dma_dev) 1057 1051 { 1058 1052 void *queue = NULL; 1059 1053 dma_addr_t dma_addr; ··· 1069 1061 for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) { 1070 1062 queue = vring_alloc_queue(vdev, vring_size(num, vring_align), 1071 1063 &dma_addr, 1072 - GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO); 1064 + GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO, 1065 + dma_dev); 1073 1066 if (queue) 1074 1067 break; 1075 1068 if (!may_reduce_num) ··· 1083 1074 if (!queue) { 1084 1075 /* Try to get a single page. You are my only hope! */ 1085 1076 queue = vring_alloc_queue(vdev, vring_size(num, vring_align), 1086 - &dma_addr, GFP_KERNEL | __GFP_ZERO); 1077 + &dma_addr, GFP_KERNEL | __GFP_ZERO, 1078 + dma_dev); 1087 1079 } 1088 1080 if (!queue) 1089 1081 return -ENOMEM; ··· 1110 1100 bool context, 1111 1101 bool (*notify)(struct virtqueue *), 1112 1102 void (*callback)(struct virtqueue *), 1113 - const char *name) 1103 + const char *name, 1104 + struct device *dma_dev) 1114 1105 { 1115 1106 struct vring_virtqueue_split vring_split = {}; 1116 1107 struct virtqueue *vq; 1117 1108 int err; 1118 1109 1119 1110 err = vring_alloc_queue_split(&vring_split, vdev, num, vring_align, 1120 - may_reduce_num); 1111 + may_reduce_num, dma_dev); 1121 1112 if (err) 1122 1113 return NULL; 1123 1114 1124 1115 vq = __vring_new_virtqueue(index, &vring_split, vdev, weak_barriers, 1125 - context, notify, callback, name); 1116 + context, notify, callback, name, dma_dev); 1126 1117 if (!vq) { 1127 - vring_free_split(&vring_split, vdev); 1118 + vring_free_split(&vring_split, vdev, dma_dev); 1128 1119 return NULL; 1129 1120 } 1130 1121 ··· 1143 1132 1144 1133 err = vring_alloc_queue_split(&vring_split, vdev, num, 1145 1134 vq->split.vring_align, 1146 - vq->split.may_reduce_num); 1135 + vq->split.may_reduce_num, 1136 + vring_dma_dev(vq)); 1147 1137 if (err) 1148 1138 goto err; 1149 1139 ··· 1162 1150 return 0; 1163 1151 1164 1152 err_state_extra: 1165 - vring_free_split(&vring_split, vdev); 1153 + vring_free_split(&vring_split, vdev, vring_dma_dev(vq)); 1166 1154 err: 1167 1155 virtqueue_reinit_split(vq); 1168 1156 return -ENOMEM; ··· 1853 1841 } 1854 1842 1855 1843 static void vring_free_packed(struct vring_virtqueue_packed *vring_packed, 1856 - struct virtio_device *vdev) 1844 + struct virtio_device *vdev, 1845 + struct device *dma_dev) 1857 1846 { 1858 1847 if (vring_packed->vring.desc) 1859 1848 vring_free_queue(vdev, vring_packed->ring_size_in_bytes, 1860 1849 vring_packed->vring.desc, 1861 - vring_packed->ring_dma_addr); 1850 + vring_packed->ring_dma_addr, 1851 + dma_dev); 1862 1852 1863 1853 if (vring_packed->vring.driver) 1864 1854 vring_free_queue(vdev, vring_packed->event_size_in_bytes, 1865 1855 vring_packed->vring.driver, 1866 - vring_packed->driver_event_dma_addr); 1856 + vring_packed->driver_event_dma_addr, 1857 + dma_dev); 1867 1858 1868 1859 if (vring_packed->vring.device) 1869 1860 vring_free_queue(vdev, vring_packed->event_size_in_bytes, 1870 1861 vring_packed->vring.device, 1871 - vring_packed->device_event_dma_addr); 1862 + vring_packed->device_event_dma_addr, 1863 + dma_dev); 1872 1864 1873 1865 kfree(vring_packed->desc_state); 1874 1866 kfree(vring_packed->desc_extra); ··· 1880 1864 1881 1865 static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed, 1882 1866 struct virtio_device *vdev, 1883 - u32 num) 1867 + u32 num, struct device *dma_dev) 1884 1868 { 1885 1869 struct vring_packed_desc *ring; 1886 1870 struct vring_packed_desc_event *driver, *device; ··· 1891 1875 1892 1876 ring = vring_alloc_queue(vdev, ring_size_in_bytes, 1893 1877 &ring_dma_addr, 1894 - GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO); 1878 + GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO, 1879 + dma_dev); 1895 1880 if (!ring) 1896 1881 goto err; 1897 1882 ··· 1904 1887 1905 1888 driver = vring_alloc_queue(vdev, event_size_in_bytes, 1906 1889 &driver_event_dma_addr, 1907 - GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO); 1890 + GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO, 1891 + dma_dev); 1908 1892 if (!driver) 1909 1893 goto err; 1910 1894 ··· 1915 1897 1916 1898 device = vring_alloc_queue(vdev, event_size_in_bytes, 1917 1899 &device_event_dma_addr, 1918 - GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO); 1900 + GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO, 1901 + dma_dev); 1919 1902 if (!device) 1920 1903 goto err; 1921 1904 ··· 1928 1909 return 0; 1929 1910 1930 1911 err: 1931 - vring_free_packed(vring_packed, vdev); 1912 + vring_free_packed(vring_packed, vdev, dma_dev); 1932 1913 return -ENOMEM; 1933 1914 } 1934 1915 ··· 2006 1987 bool context, 2007 1988 bool (*notify)(struct virtqueue *), 2008 1989 void (*callback)(struct virtqueue *), 2009 - const char *name) 1990 + const char *name, 1991 + struct device *dma_dev) 2010 1992 { 2011 1993 struct vring_virtqueue_packed vring_packed = {}; 2012 1994 struct vring_virtqueue *vq; 2013 1995 int err; 2014 1996 2015 - if (vring_alloc_queue_packed(&vring_packed, vdev, num)) 1997 + if (vring_alloc_queue_packed(&vring_packed, vdev, num, dma_dev)) 2016 1998 goto err_ring; 2017 1999 2018 2000 vq = kmalloc(sizeof(*vq), GFP_KERNEL); ··· 2034 2014 vq->broken = false; 2035 2015 #endif 2036 2016 vq->packed_ring = true; 2017 + vq->dma_dev = dma_dev; 2037 2018 vq->use_dma_api = vring_use_dma_api(vdev); 2038 2019 2039 2020 vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && ··· 2061 2040 err_state_extra: 2062 2041 kfree(vq); 2063 2042 err_vq: 2064 - vring_free_packed(&vring_packed, vdev); 2043 + vring_free_packed(&vring_packed, vdev, dma_dev); 2065 2044 err_ring: 2066 2045 return NULL; 2067 2046 } ··· 2073 2052 struct virtio_device *vdev = _vq->vdev; 2074 2053 int err; 2075 2054 2076 - if (vring_alloc_queue_packed(&vring_packed, vdev, num)) 2055 + if (vring_alloc_queue_packed(&vring_packed, vdev, num, vring_dma_dev(vq))) 2077 2056 goto err_ring; 2078 2057 2079 2058 err = vring_alloc_state_extra_packed(&vring_packed); ··· 2090 2069 return 0; 2091 2070 2092 2071 err_state_extra: 2093 - vring_free_packed(&vring_packed, vdev); 2072 + vring_free_packed(&vring_packed, vdev, vring_dma_dev(vq)); 2094 2073 err_ring: 2095 2074 virtqueue_reinit_packed(vq); 2096 2075 return -ENOMEM; ··· 2502 2481 bool context, 2503 2482 bool (*notify)(struct virtqueue *), 2504 2483 void (*callback)(struct virtqueue *), 2505 - const char *name) 2484 + const char *name, 2485 + struct device *dma_dev) 2506 2486 { 2507 2487 struct vring_virtqueue *vq; 2508 2488 int err; ··· 2529 2507 #else 2530 2508 vq->broken = false; 2531 2509 #endif 2510 + vq->dma_dev = dma_dev; 2532 2511 vq->use_dma_api = vring_use_dma_api(vdev); 2533 2512 2534 2513 vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && ··· 2572 2549 if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED)) 2573 2550 return vring_create_virtqueue_packed(index, num, vring_align, 2574 2551 vdev, weak_barriers, may_reduce_num, 2575 - context, notify, callback, name); 2552 + context, notify, callback, name, vdev->dev.parent); 2576 2553 2577 2554 return vring_create_virtqueue_split(index, num, vring_align, 2578 2555 vdev, weak_barriers, may_reduce_num, 2579 - context, notify, callback, name); 2556 + context, notify, callback, name, vdev->dev.parent); 2580 2557 } 2581 2558 EXPORT_SYMBOL_GPL(vring_create_virtqueue); 2559 + 2560 + struct virtqueue *vring_create_virtqueue_dma( 2561 + unsigned int index, 2562 + unsigned int num, 2563 + unsigned int vring_align, 2564 + struct virtio_device *vdev, 2565 + bool weak_barriers, 2566 + bool may_reduce_num, 2567 + bool context, 2568 + bool (*notify)(struct virtqueue *), 2569 + void (*callback)(struct virtqueue *), 2570 + const char *name, 2571 + struct device *dma_dev) 2572 + { 2573 + 2574 + if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED)) 2575 + return vring_create_virtqueue_packed(index, num, vring_align, 2576 + vdev, weak_barriers, may_reduce_num, 2577 + context, notify, callback, name, dma_dev); 2578 + 2579 + return vring_create_virtqueue_split(index, num, vring_align, 2580 + vdev, weak_barriers, may_reduce_num, 2581 + context, notify, callback, name, dma_dev); 2582 + } 2583 + EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma); 2582 2584 2583 2585 /** 2584 2586 * virtqueue_resize - resize the vring of vq ··· 2693 2645 2694 2646 vring_init(&vring_split.vring, num, pages, vring_align); 2695 2647 return __vring_new_virtqueue(index, &vring_split, vdev, weak_barriers, 2696 - context, notify, callback, name); 2648 + context, notify, callback, name, 2649 + vdev->dev.parent); 2697 2650 } 2698 2651 EXPORT_SYMBOL_GPL(vring_new_virtqueue); 2699 2652 ··· 2707 2658 vring_free_queue(vq->vq.vdev, 2708 2659 vq->packed.ring_size_in_bytes, 2709 2660 vq->packed.vring.desc, 2710 - vq->packed.ring_dma_addr); 2661 + vq->packed.ring_dma_addr, 2662 + vring_dma_dev(vq)); 2711 2663 2712 2664 vring_free_queue(vq->vq.vdev, 2713 2665 vq->packed.event_size_in_bytes, 2714 2666 vq->packed.vring.driver, 2715 - vq->packed.driver_event_dma_addr); 2667 + vq->packed.driver_event_dma_addr, 2668 + vring_dma_dev(vq)); 2716 2669 2717 2670 vring_free_queue(vq->vq.vdev, 2718 2671 vq->packed.event_size_in_bytes, 2719 2672 vq->packed.vring.device, 2720 - vq->packed.device_event_dma_addr); 2673 + vq->packed.device_event_dma_addr, 2674 + vring_dma_dev(vq)); 2721 2675 2722 2676 kfree(vq->packed.desc_state); 2723 2677 kfree(vq->packed.desc_extra); ··· 2728 2676 vring_free_queue(vq->vq.vdev, 2729 2677 vq->split.queue_size_in_bytes, 2730 2678 vq->split.vring.desc, 2731 - vq->split.queue_dma_addr); 2679 + vq->split.queue_dma_addr, 2680 + vring_dma_dev(vq)); 2732 2681 } 2733 2682 } 2734 2683 if (!vq->packed_ring) {

+10 -3

drivers/virtio/virtio_vdpa.c

··· 135 135 { 136 136 struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev); 137 137 struct vdpa_device *vdpa = vd_get_vdpa(vdev); 138 + struct device *dma_dev; 138 139 const struct vdpa_config_ops *ops = vdpa->config; 139 140 struct virtio_vdpa_vq_info *info; 140 141 struct vdpa_callback cb; ··· 176 175 177 176 /* Create the vring */ 178 177 align = ops->get_vq_align(vdpa); 179 - vq = vring_create_virtqueue(index, max_num, align, vdev, 180 - true, may_reduce_num, ctx, 181 - virtio_vdpa_notify, callback, name); 178 + 179 + if (ops->get_vq_dma_dev) 180 + dma_dev = ops->get_vq_dma_dev(vdpa, index); 181 + else 182 + dma_dev = vdpa_get_dma_dev(vdpa); 183 + vq = vring_create_virtqueue_dma(index, max_num, align, vdev, 184 + true, may_reduce_num, ctx, 185 + virtio_vdpa_notify, callback, 186 + name, dma_dev); 182 187 if (!vq) { 183 188 err = -ENOMEM; 184 189 goto error_new_virtqueue;

+2

include/linux/pci_ids.h

··· 3094 3094 3095 3095 #define PCI_VENDOR_ID_3COM_2 0xa727 3096 3096 3097 + #define PCI_VENDOR_ID_SOLIDRUN 0xd063 3098 + 3097 3099 #define PCI_VENDOR_ID_DIGIUM 0xd161 3098 3100 #define PCI_DEVICE_ID_DIGIUM_HFC4S 0xb410 3099 3101

+11 -1

include/linux/vdpa.h

··· 219 219 * @reset: Reset device 220 220 * @vdev: vdpa device 221 221 * Returns integer: success (0) or error (< 0) 222 - * @suspend: Suspend or resume the device (optional) 222 + * @suspend: Suspend the device (optional) 223 + * @vdev: vdpa device 224 + * Returns integer: success (0) or error (< 0) 225 + * @resume: Resume the device (optional) 223 226 * @vdev: vdpa device 224 227 * Returns integer: success (0) or error (< 0) 225 228 * @get_config_size: Get the size of the configuration space includes ··· 285 282 * @iova: iova to be unmapped 286 283 * @size: size of the area 287 284 * Returns integer: success (0) or error (< 0) 285 + * @get_vq_dma_dev: Get the dma device for a specific 286 + * virtqueue (optional) 287 + * @vdev: vdpa device 288 + * @idx: virtqueue index 289 + * Returns pointer to structure device or error (NULL) 288 290 * @free: Free resources that belongs to vDPA (optional) 289 291 * @vdev: vdpa device 290 292 */ ··· 332 324 void (*set_status)(struct vdpa_device *vdev, u8 status); 333 325 int (*reset)(struct vdpa_device *vdev); 334 326 int (*suspend)(struct vdpa_device *vdev); 327 + int (*resume)(struct vdpa_device *vdev); 335 328 size_t (*get_config_size)(struct vdpa_device *vdev); 336 329 void (*get_config)(struct vdpa_device *vdev, unsigned int offset, 337 330 void *buf, unsigned int len); ··· 350 341 u64 iova, u64 size); 351 342 int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group, 352 343 unsigned int asid); 344 + struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx); 353 345 354 346 /* Free device resources */ 355 347 void (*free)(struct vdpa_device *vdev);

+5 -3

include/linux/virtio_config.h

··· 16 16 u64 len; 17 17 }; 18 18 19 + typedef void vq_callback_t(struct virtqueue *); 20 + 19 21 /** 20 - * virtio_config_ops - operations for configuring a virtio device 22 + * struct virtio_config_ops - operations for configuring a virtio device 21 23 * Note: Do not assume that a transport implements all of the operations 22 24 * getting/setting a value as a simple read/write! Generally speaking, 23 25 * any of @get/@set, @get_status/@set_status, or @get_features/ ··· 71 69 * vdev: the virtio_device 72 70 * This sends the driver feature bits to the device: it can change 73 71 * the dev->feature bits if it wants. 74 - * Note: despite the name this can be called any number of times. 72 + * Note that despite the name this can be called any number of 73 + * times. 75 74 * Returns 0 on success or error status 76 75 * @bus_name: return the bus name associated with the device (optional) 77 76 * vdev: the virtio_device ··· 94 91 * If disable_vq_and_reset is set, then enable_vq_after_reset must also be 95 92 * set. 96 93 */ 97 - typedef void vq_callback_t(struct virtqueue *); 98 94 struct virtio_config_ops { 99 95 void (*get)(struct virtio_device *vdev, unsigned offset, 100 96 void *buf, unsigned len);

+16

include/linux/virtio_ring.h

··· 77 77 const char *name); 78 78 79 79 /* 80 + * Creates a virtqueue and allocates the descriptor ring with per 81 + * virtqueue DMA device. 82 + */ 83 + struct virtqueue *vring_create_virtqueue_dma(unsigned int index, 84 + unsigned int num, 85 + unsigned int vring_align, 86 + struct virtio_device *vdev, 87 + bool weak_barriers, 88 + bool may_reduce_num, 89 + bool ctx, 90 + bool (*notify)(struct virtqueue *vq), 91 + void (*callback)(struct virtqueue *vq), 92 + const char *name, 93 + struct device *dma_dev); 94 + 95 + /* 80 96 * Creates a virtqueue with a standard layout but a caller-allocated 81 97 * ring. 82 98 */

+1 -1

include/linux/vringh.h

··· 92 92 }; 93 93 94 94 /** 95 - * struct vringh_iov - kvec mangler. 95 + * struct vringh_kiov - kvec mangler. 96 96 * 97 97 * Mangles kvec in place, and restores it. 98 98 * Remaining data is iov + i, of used - i elements.

+8

include/uapi/linux/vhost.h

··· 180 180 */ 181 181 #define VHOST_VDPA_SUSPEND _IO(VHOST_VIRTIO, 0x7D) 182 182 183 + /* Resume a device so it can resume processing virtqueue requests 184 + * 185 + * After the return of this ioctl the device will have restored all the 186 + * necessary states and it is fully operational to continue processing the 187 + * virtqueue descriptors. 188 + */ 189 + #define VHOST_VDPA_RESUME _IO(VHOST_VIRTIO, 0x7E) 190 + 183 191 #endif

+2

include/uapi/linux/vhost_types.h

··· 163 163 #define VHOST_BACKEND_F_IOTLB_ASID 0x3 164 164 /* Device can be suspended */ 165 165 #define VHOST_BACKEND_F_SUSPEND 0x4 166 + /* Device can be resumed */ 167 + #define VHOST_BACKEND_F_RESUME 0x5 166 168 167 169 #endif

+105

include/uapi/linux/virtio_blk.h

··· 41 41 #define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */ 42 42 #define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */ 43 43 #define VIRTIO_BLK_F_SECURE_ERASE 16 /* Secure Erase is supported */ 44 + #define VIRTIO_BLK_F_ZONED 17 /* Zoned block device */ 44 45 45 46 /* Legacy feature bits */ 46 47 #ifndef VIRTIO_BLK_NO_LEGACY ··· 138 137 /* Secure erase commands must be aligned to this number of sectors. */ 139 138 __virtio32 secure_erase_sector_alignment; 140 139 140 + /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */ 141 + struct virtio_blk_zoned_characteristics { 142 + __le32 zone_sectors; 143 + __le32 max_open_zones; 144 + __le32 max_active_zones; 145 + __le32 max_append_sectors; 146 + __le32 write_granularity; 147 + __u8 model; 148 + __u8 unused2[3]; 149 + } zoned; 141 150 } __attribute__((packed)); 142 151 143 152 /* ··· 185 174 /* Secure erase command */ 186 175 #define VIRTIO_BLK_T_SECURE_ERASE 14 187 176 177 + /* Zone append command */ 178 + #define VIRTIO_BLK_T_ZONE_APPEND 15 179 + 180 + /* Report zones command */ 181 + #define VIRTIO_BLK_T_ZONE_REPORT 16 182 + 183 + /* Open zone command */ 184 + #define VIRTIO_BLK_T_ZONE_OPEN 18 185 + 186 + /* Close zone command */ 187 + #define VIRTIO_BLK_T_ZONE_CLOSE 20 188 + 189 + /* Finish zone command */ 190 + #define VIRTIO_BLK_T_ZONE_FINISH 22 191 + 192 + /* Reset zone command */ 193 + #define VIRTIO_BLK_T_ZONE_RESET 24 194 + 195 + /* Reset All zones command */ 196 + #define VIRTIO_BLK_T_ZONE_RESET_ALL 26 197 + 188 198 #ifndef VIRTIO_BLK_NO_LEGACY 189 199 /* Barrier before this op. */ 190 200 #define VIRTIO_BLK_T_BARRIER 0x80000000 ··· 224 192 /* Sector (ie. 512 byte offset) */ 225 193 __virtio64 sector; 226 194 }; 195 + 196 + /* 197 + * Supported zoned device models. 198 + */ 199 + 200 + /* Regular block device */ 201 + #define VIRTIO_BLK_Z_NONE 0 202 + /* Host-managed zoned device */ 203 + #define VIRTIO_BLK_Z_HM 1 204 + /* Host-aware zoned device */ 205 + #define VIRTIO_BLK_Z_HA 2 206 + 207 + /* 208 + * Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply. 209 + */ 210 + struct virtio_blk_zone_descriptor { 211 + /* Zone capacity */ 212 + __le64 z_cap; 213 + /* The starting sector of the zone */ 214 + __le64 z_start; 215 + /* Zone write pointer position in sectors */ 216 + __le64 z_wp; 217 + /* Zone type */ 218 + __u8 z_type; 219 + /* Zone state */ 220 + __u8 z_state; 221 + __u8 reserved[38]; 222 + }; 223 + 224 + struct virtio_blk_zone_report { 225 + __le64 nr_zones; 226 + __u8 reserved[56]; 227 + struct virtio_blk_zone_descriptor zones[]; 228 + }; 229 + 230 + /* 231 + * Supported zone types. 232 + */ 233 + 234 + /* Conventional zone */ 235 + #define VIRTIO_BLK_ZT_CONV 1 236 + /* Sequential Write Required zone */ 237 + #define VIRTIO_BLK_ZT_SWR 2 238 + /* Sequential Write Preferred zone */ 239 + #define VIRTIO_BLK_ZT_SWP 3 240 + 241 + /* 242 + * Zone states that are available for zones of all types. 243 + */ 244 + 245 + /* Not a write pointer (conventional zones only) */ 246 + #define VIRTIO_BLK_ZS_NOT_WP 0 247 + /* Empty */ 248 + #define VIRTIO_BLK_ZS_EMPTY 1 249 + /* Implicitly Open */ 250 + #define VIRTIO_BLK_ZS_IOPEN 2 251 + /* Explicitly Open */ 252 + #define VIRTIO_BLK_ZS_EOPEN 3 253 + /* Closed */ 254 + #define VIRTIO_BLK_ZS_CLOSED 4 255 + /* Read-Only */ 256 + #define VIRTIO_BLK_ZS_RDONLY 13 257 + /* Full */ 258 + #define VIRTIO_BLK_ZS_FULL 14 259 + /* Offline */ 260 + #define VIRTIO_BLK_ZS_OFFLINE 15 227 261 228 262 /* Unmap this range (only valid for write zeroes command) */ 229 263 #define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP 0x00000001 ··· 317 219 #define VIRTIO_BLK_S_OK 0 318 220 #define VIRTIO_BLK_S_IOERR 1 319 221 #define VIRTIO_BLK_S_UNSUPP 2 222 + 223 + /* Error codes that are specific to zoned block devices */ 224 + #define VIRTIO_BLK_S_ZONE_INVALID_CMD 3 225 + #define VIRTIO_BLK_S_ZONE_UNALIGNED_WP 4 226 + #define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE 5 227 + #define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6 228 + 320 229 #endif /* _LINUX_VIRTIO_BLK_H */

+1 -1

tools/virtio/Makefile

··· 4 4 virtio_test: virtio_ring.o virtio_test.o 5 5 vringh_test: vringh_test.o vringh.o virtio_ring.o 6 6 7 - CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h 7 + CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h -mfunction-return=thunk -fcf-protection=none -mindirect-branch-register 8 8 CFLAGS += -pthread 9 9 LDFLAGS += -pthread 10 10 vpath %.c ../../drivers/virtio ../../drivers/vhost

Configure Feed

Configure Feed