Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

-15

Documentation/driver-api/infiniband.rst

··· 92 92 .. kernel-doc:: drivers/infiniband/ulp/iser/iser_verbs.c 93 93 :internal: 94 94 95 - Omni-Path (OPA) Virtual NIC support 96 - ----------------------------------- 97 - 98 - .. kernel-doc:: drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h 99 - :internal: 100 - 101 - .. kernel-doc:: drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h 102 - :internal: 103 - 104 - .. kernel-doc:: drivers/infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c 105 - :internal: 106 - 107 - .. kernel-doc:: drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c 108 - :internal: 109 - 110 95 InfiniBand SCSI RDMA protocol target support 111 96 -------------------------------------------- 112 97

-1

Documentation/infiniband/index.rst

··· 9 9 10 10 core_locking 11 11 ipoib 12 - opa_vnic 13 12 sysfs 14 13 tag_matching 15 14 ucaps

-159

Documentation/infiniband/opa_vnic.rst

··· 1 - ================================================================= 2 - Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) 3 - ================================================================= 4 - 5 - Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature 6 - supports Ethernet functionality over Omni-Path fabric by encapsulating 7 - the Ethernet packets between HFI nodes. 8 - 9 - Architecture 10 - ============= 11 - The patterns of exchanges of Omni-Path encapsulated Ethernet packets 12 - involves one or more virtual Ethernet switches overlaid on the Omni-Path 13 - fabric topology. A subset of HFI nodes on the Omni-Path fabric are 14 - permitted to exchange encapsulated Ethernet packets across a particular 15 - virtual Ethernet switch. The virtual Ethernet switches are logical 16 - abstractions achieved by configuring the HFI nodes on the fabric for 17 - header generation and processing. In the simplest configuration all HFI 18 - nodes across the fabric exchange encapsulated Ethernet packets over a 19 - single virtual Ethernet switch. A virtual Ethernet switch, is effectively 20 - an independent Ethernet network. The configuration is performed by an 21 - Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM) 22 - application. HFI nodes can have multiple VNICs each connected to a 23 - different virtual Ethernet switch. The below diagram presents a case 24 - of two virtual Ethernet switches with two HFI nodes:: 25 - 26 - +-------------------+ 27 - | Subnet/ | 28 - | Ethernet | 29 - | Manager | 30 - +-------------------+ 31 - / / 32 - / / 33 - / / 34 - / / 35 - +-----------------------------+ +------------------------------+ 36 - | Virtual Ethernet Switch | | Virtual Ethernet Switch | 37 - | +---------+ +---------+ | | +---------+ +---------+ | 38 - | | VPORT | | VPORT | | | | VPORT | | VPORT | | 39 - +--+---------+----+---------+-+ +-+---------+----+---------+---+ 40 - | \ / | 41 - | \ / | 42 - | \/ | 43 - | / \ | 44 - | / \ | 45 - +-----------+------------+ +-----------+------------+ 46 - | VNIC | VNIC | | VNIC | VNIC | 47 - +-----------+------------+ +-----------+------------+ 48 - | HFI | | HFI | 49 - +------------------------+ +------------------------+ 50 - 51 - 52 - The Omni-Path encapsulated Ethernet packet format is as described below. 53 - 54 - ==================== ================================ 55 - Bits Field 56 - ==================== ================================ 57 - Quad Word 0: 58 - 0-19 SLID (lower 20 bits) 59 - 20-30 Length (in Quad Words) 60 - 31 BECN bit 61 - 32-51 DLID (lower 20 bits) 62 - 52-56 SC (Service Class) 63 - 57-59 RC (Routing Control) 64 - 60 FECN bit 65 - 61-62 L2 (=10, 16B format) 66 - 63 LT (=1, Link Transfer Head Flit) 67 - 68 - Quad Word 1: 69 - 0-7 L4 type (=0x78 ETHERNET) 70 - 8-11 SLID[23:20] 71 - 12-15 DLID[23:20] 72 - 16-31 PKEY 73 - 32-47 Entropy 74 - 48-63 Reserved 75 - 76 - Quad Word 2: 77 - 0-15 Reserved 78 - 16-31 L4 header 79 - 32-63 Ethernet Packet 80 - 81 - Quad Words 3 to N-1: 82 - 0-63 Ethernet packet (pad extended) 83 - 84 - Quad Word N (last): 85 - 0-23 Ethernet packet (pad extended) 86 - 24-55 ICRC 87 - 56-61 Tail 88 - 62-63 LT (=01, Link Transfer Tail Flit) 89 - ==================== ================================ 90 - 91 - Ethernet packet is padded on the transmit side to ensure that the VNIC OPA 92 - packet is quad word aligned. The 'Tail' field contains the number of bytes 93 - padded. On the receive side the 'Tail' field is read and the padding is 94 - removed (along with ICRC, Tail and OPA header) before passing packet up 95 - the network stack. 96 - 97 - The L4 header field contains the virtual Ethernet switch id the VNIC port 98 - belongs to. On the receive side, this field is used to de-multiplex the 99 - received VNIC packets to different VNIC ports. 100 - 101 - Driver Design 102 - ============== 103 - Intel OPA VNIC software design is presented in the below diagram. 104 - OPA VNIC functionality has a HW dependent component and a HW 105 - independent component. 106 - 107 - The support has been added for IB device to allocate and free the RDMA 108 - netdev devices. The RDMA netdev supports interfacing with the network 109 - stack thus creating standard network interfaces. OPA_VNIC is an RDMA 110 - netdev device type. 111 - 112 - The HW dependent VNIC functionality is part of the HFI1 driver. It 113 - implements the verbs to allocate and free the OPA_VNIC RDMA netdev. 114 - It involves HW resource allocation/management for VNIC functionality. 115 - It interfaces with the network stack and implements the required 116 - net_device_ops functions. It expects Omni-Path encapsulated Ethernet 117 - packets in the transmit path and provides HW access to them. It strips 118 - the Omni-Path header from the received packets before passing them up 119 - the network stack. It also implements the RDMA netdev control operations. 120 - 121 - The OPA VNIC module implements the HW independent VNIC functionality. 122 - It consists of two parts. The VNIC Ethernet Management Agent (VEMA) 123 - registers itself with IB core as an IB client and interfaces with the 124 - IB MAD stack. It exchanges the management information with the Ethernet 125 - Manager (EM) and the VNIC netdev. The VNIC netdev part allocates and frees 126 - the OPA_VNIC RDMA netdev devices. It overrides the net_device_ops functions 127 - set by HW dependent VNIC driver where required to accommodate any control 128 - operation. It also handles the encapsulation of Ethernet packets with an 129 - Omni-Path header in the transmit path. For each VNIC interface, the 130 - information required for encapsulation is configured by the EM via VEMA MAD 131 - interface. It also passes any control information to the HW dependent driver 132 - by invoking the RDMA netdev control operations:: 133 - 134 - +-------------------+ +----------------------+ 135 - | | | Linux | 136 - | IB MAD | | Network | 137 - | | | Stack | 138 - +-------------------+ +----------------------+ 139 - | | | 140 - | | | 141 - +----------------------------+ | 142 - | | | 143 - | OPA VNIC Module | | 144 - | (OPA VNIC RDMA Netdev | | 145 - | & EMA functions) | | 146 - | | | 147 - +----------------------------+ | 148 - | | 149 - | | 150 - +------------------+ | 151 - | IB core | | 152 - +------------------+ | 153 - | | 154 - | | 155 - +--------------------------------------------+ 156 - | | 157 - | HFI1 Driver with VNIC support | 158 - | | 159 - +--------------------------------------------+

-1

Documentation/translations/zh_CN/infiniband/index.rst

··· 24 24 25 25 core_locking 26 26 ipoib 27 - opa_vnic 28 27 sysfs 29 28 tag_matching 30 29 user_mad

-156

Documentation/translations/zh_CN/infiniband/opa_vnic.rst

··· 1 - .. include:: ../disclaimer-zh_CN.rst 2 - 3 - :Original: Documentation/infiniband/opa_vnic.rst 4 - 5 - :翻译: 6 - 7 - 司延腾 Yanteng Si <siyanteng@loongson.cn> 8 - 9 - :校译: 10 - 11 - 王普宇 Puyu Wang <realpuyuwang@gmail.com> 12 - 时奎亮 Alex Shi <alexs@kernel.org> 13 - 14 - .. _cn_infiniband_opa_vnic: 15 - 16 - ============================================= 17 - 英特尔全路径（OPA）虚拟网络接口控制器（VNIC） 18 - ============================================= 19 - 20 - 英特尔全路径（OPA）虚拟网络接口控制器（VNIC）功能通过封装HFI节点之间的以 21 - 太网数据包，支持Omni-Path结构上的以太网功能。 22 - 23 - 体系结构 24 - ======== 25 - 26 - Omni-Path封装的以太网数据包的交换模式涉及Omni-Path结构拓扑上覆盖的一个或 27 - 多个虚拟以太网交换机。Omni-Path结构上的HFI节点的一个子集被允许在特定的虚 28 - 拟以太网交换机上交换封装的以太网数据包。虚拟以太网交换机是通过配置结构上的 29 - HFI节点实现的逻辑抽象，用于生成和处理报头。在最简单的配置中，整个结构的所有 30 - HFI节点通过一个虚拟以太网交换机交换封装的以太网数据包。一个虚拟以太网交换机， 31 - 实际上是一个独立的以太网网络。该配置由以太网管理器（EM）执行，它是可信的结 32 - 构管理器（FM）应用程序的一部分。HFI节点可以有多个VNIC，每个连接到不同的虚 33 - 拟以太网交换机。下图介绍了两个虚拟以太网交换机与两个HFI节点的情况:: 34 - 35 - +-------------------+ 36 - | 子网/ | 37 - | 以太网 | 38 - | 管理 | 39 - +-------------------+ 40 - / / 41 - / / 42 - / / 43 - / / 44 - +-----------------------------+ +------------------------------+ 45 - | 虚拟以太网切换 | | 虚拟以太网切换 | 46 - | +---------+ +---------+ | | +---------+ +---------+ | 47 - | | VPORT | | VPORT | | | | VPORT | | VPORT | | 48 - +--+---------+----+---------+-+ +-+---------+----+---------+---+ 49 - | \ / | 50 - | \ / | 51 - | \/ | 52 - | / \ | 53 - | / \ | 54 - +-----------+------------+ +-----------+------------+ 55 - | VNIC | VNIC | | VNIC | VNIC | 56 - +-----------+------------+ +-----------+------------+ 57 - | HFI | | HFI | 58 - +------------------------+ +------------------------+ 59 - 60 - 61 - Omni-Path封装的以太网数据包格式如下所述。 62 - 63 - ==================== ================================ 64 - 位域 65 - ==================== ================================ 66 - Quad Word 0: 67 - 0-19 SLID (低20位) 68 - 20-30 长度 (以四字为单位) 69 - 31 BECN 位 70 - 32-51 DLID (低20位) 71 - 52-56 SC (服务级别) 72 - 57-59 RC (路由控制) 73 - 60 FECN 位 74 - 61-62 L2 (=10, 16B 格式) 75 - 63 LT (=1, 链路传输头 Flit) 76 - 77 - Quad Word 1: 78 - 0-7 L4 type (=0x78 ETHERNET) 79 - 8-11 SLID[23:20] 80 - 12-15 DLID[23:20] 81 - 16-31 PKEY 82 - 32-47 熵 83 - 48-63 保留 84 - 85 - Quad Word 2: 86 - 0-15 保留 87 - 16-31 L4 头 88 - 32-63 以太网数据包 89 - 90 - Quad Words 3 to N-1: 91 - 0-63 以太网数据包 (pad拓展) 92 - 93 - Quad Word N (last): 94 - 0-23 以太网数据包 (pad拓展) 95 - 24-55 ICRC 96 - 56-61 尾 97 - 62-63 LT (=01, 链路传输尾 Flit) 98 - ==================== ================================ 99 - 100 - 以太网数据包在传输端被填充，以确保VNIC OPA数据包是四字对齐的。“尾”字段 101 - 包含填充的字节数。在接收端，“尾”字段被读取，在将数据包向上传递到网络堆 102 - 栈之前，填充物被移除（与ICRC、尾和OPA头一起）。 103 - 104 - L4头字段包含VNIC端口所属的虚拟以太网交换机ID。在接收端，该字段用于将收 105 - 到的VNIC数据包去多路复用到不同的VNIC端口。 106 - 107 - 驱动设计 108 - ======== 109 - 110 - 英特尔OPA VNIC的软件设计如下图所示。OPA VNIC功能有一个依赖于硬件的部分 111 - 和一个独立于硬件的部分。 112 - 113 - 对IB设备分配和释放RDMA netdev设备的支持已经被加入。RDMA netdev支持与 114 - 网络堆栈的对接，从而创建标准的网络接口。OPA_VNIC是一个RDMA netdev设备 115 - 类型。 116 - 117 - 依赖于HW的VNIC功能是HFI1驱动的一部分。它实现了分配和释放OPA_VNIC RDMA 118 - netdev的动作。它涉及VNIC功能的HW资源分配/管理。它与网络堆栈接口并实现所 119 - 需的net_device_ops功能。它在传输路径中期待Omni-Path封装的以太网数据包， 120 - 并提供对它们的HW访问。在将数据包向上传递到网络堆栈之前，它把Omni-Path头 121 - 从接收的数据包中剥离。它还实现了RDMA netdev控制操作。 122 - 123 - OPA VNIC模块实现了独立于硬件的VNIC功能。它由两部分组成。VNIC以太网管理 124 - 代理（VEMA）作为一个IB客户端向IB核心注册，并与IB MAD栈接口。它与以太网 125 - 管理器（EM）和VNIC netdev交换管理信息。VNIC netdev部分分配和释放OPA_VNIC 126 - RDMA netdev设备。它在需要时覆盖由依赖HW的VNIC驱动设置的net_device_ops函数， 127 - 以适应任何控制操作。它还处理以太网数据包的封装，在传输路径中使用Omni-Path头。 128 - 对于每个VNIC接口，封装所需的信息是由EM通过VEMA MAD接口配置的。它还通过调用 129 - RDMA netdev控制操作将任何控制信息传递给依赖于HW的驱动程序:: 130 - 131 - +-------------------+ +----------------------+ 132 - | | | Linux | 133 - | IB MAD | | 网络 | 134 - | | | 栈 | 135 - +-------------------+ +----------------------+ 136 - | | | 137 - | | | 138 - +----------------------------+ | 139 - | | | 140 - | OPA VNIC 模块 | | 141 - | (OPA VNIC RDMA Netdev | | 142 - | & EMA 函数) | | 143 - | | | 144 - +----------------------------+ | 145 - | | 146 - | | 147 - +------------------+ | 148 - | IB 核心 | | 149 - +------------------+ | 150 - | | 151 - | | 152 - +--------------------------------------------+ 153 - | | 154 - | HFI1 驱动和 VNIC 支持 | 155 - | | 156 - +--------------------------------------------+

+2 -6

MAINTAINERS

··· 12619 12619 F: include/uapi/rdma/ 12620 12620 F: samples/bpf/ibumad_kern.c 12621 12621 F: samples/bpf/ibumad_user.c 12622 + F: tools/testing/selftests/rdma/ 12622 12623 12623 12624 INGENIC JZ4780 NAND DRIVER 12624 12625 M: Harvey Hunt <harveyhuntnexus@gmail.com> ··· 19933 19932 S: Maintained 19934 19933 F: drivers/rtc/rtc-optee.c 19935 19934 19936 - OPA-VNIC DRIVER 19937 - M: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> 19938 - L: linux-rdma@vger.kernel.org 19939 - S: Supported 19940 - F: drivers/infiniband/ulp/opa_vnic 19941 - 19942 19935 OPEN ALLIANCE 10BASE-T1S MACPHY SERIAL INTERFACE FRAMEWORK 19943 19936 M: Parthiban Veerasooran <parthiban.veerasooran@microchip.com> 19944 19937 L: netdev@vger.kernel.org ··· 24741 24746 S: Supported 24742 24747 F: drivers/infiniband/sw/rxe/ 24743 24748 F: include/uapi/rdma/rdma_user_rxe.h 24749 + F: tools/testing/selftests/rdma/rxe* 24744 24750 24745 24751 SOFTLOGIC 6x10 MPEG CODEC 24746 24752 M: Bluecherry Maintainers <maintainers@bluecherrydvr.com>

-2

drivers/infiniband/Kconfig

··· 111 111 source "drivers/infiniband/ulp/isert/Kconfig" 112 112 source "drivers/infiniband/ulp/rtrs/Kconfig" 113 113 114 - source "drivers/infiniband/ulp/opa_vnic/Kconfig" 115 - 116 114 endif # INFINIBAND

+3 -3

drivers/infiniband/core/Makefile

··· 12 12 roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \ 13 13 multicast.o mad.o smi.o agent.o mad_rmpp.o \ 14 14 nldev.o restrack.o counters.o ib_core_uverbs.o \ 15 - trace.o lag.o 15 + trace.o lag.o iter.o frmr_pools.o 16 16 17 17 ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o 18 18 ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o 19 + ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o 20 + ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o 19 21 20 22 ib_cm-y := cm.o cm_trace.o 21 23 ··· 45 43 uverbs_std_types_wq.o \ 46 44 uverbs_std_types_qp.o \ 47 45 ucaps.o 48 - ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o 49 - ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o

+3

drivers/infiniband/core/addr.c

··· 320 320 if (!n) 321 321 return -ENODATA; 322 322 323 + read_lock_bh(&n->lock); 323 324 if (!(n->nud_state & NUD_VALID)) { 325 + read_unlock_bh(&n->lock); 324 326 neigh_event_send(n, NULL); 325 327 ret = -ENODATA; 326 328 } else { 327 329 neigh_ha_snapshot(dev_addr->dst_dev_addr, n, dst->dev); 330 + read_unlock_bh(&n->lock); 328 331 } 329 332 330 333 neigh_release(n);

+3 -12

drivers/infiniband/core/cache.c

··· 116 116 /* rwlock protects data_vec[ix]->state and entry pointer. 117 117 */ 118 118 rwlock_t rwlock; 119 - struct ib_gid_table_entry **data_vec; 120 119 /* bit field, each bit indicates the index of default GID */ 121 120 u32 default_gid_indices; 121 + struct ib_gid_table_entry *data_vec[] __counted_by(sz); 122 122 }; 123 123 124 124 static void dispatch_gid_change_event(struct ib_device *ib_dev, u32 port) ··· 770 770 771 771 static struct ib_gid_table *alloc_gid_table(int sz) 772 772 { 773 - struct ib_gid_table *table = kzalloc_obj(*table); 773 + struct ib_gid_table *table = kzalloc_flex(*table, data_vec, sz); 774 774 775 775 if (!table) 776 776 return NULL; 777 777 778 - table->data_vec = kzalloc_objs(*table->data_vec, sz); 779 - if (!table->data_vec) 780 - goto err_free_table; 778 + table->sz = sz; 781 779 782 780 mutex_init(&table->lock); 783 - 784 - table->sz = sz; 785 781 rwlock_init(&table->rwlock); 786 782 return table; 787 - 788 - err_free_table: 789 - kfree(table); 790 - return NULL; 791 783 } 792 784 793 785 static void release_gid_table(struct ib_device *device, ··· 801 809 } 802 810 803 811 mutex_destroy(&table->lock); 804 - kfree(table->data_vec); 805 812 kfree(table); 806 813 } 807 814

+3

drivers/infiniband/core/cq.c

··· 220 220 struct ib_cq *cq; 221 221 int ret = -ENOMEM; 222 222 223 + if (WARN_ON_ONCE(!nr_cqe)) 224 + return ERR_PTR(-EINVAL); 225 + 223 226 cq = rdma_zalloc_drv_obj(dev, ib_cq); 224 227 if (!cq) 225 228 return ERR_PTR(ret);

+3 -3

drivers/infiniband/core/device.c

··· 2707 2707 2708 2708 dev_ops->uverbs_no_driver_id_binding |= 2709 2709 ops->uverbs_no_driver_id_binding; 2710 + dev_ops->uverbs_robust_udata |= ops->uverbs_robust_udata; 2710 2711 2711 2712 SET_DEVICE_OP(dev_ops, add_gid); 2712 2713 SET_DEVICE_OP(dev_ops, add_sub_dev); ··· 2734 2733 SET_DEVICE_OP(dev_ops, create_ah); 2735 2734 SET_DEVICE_OP(dev_ops, create_counters); 2736 2735 SET_DEVICE_OP(dev_ops, create_cq); 2737 - SET_DEVICE_OP(dev_ops, create_cq_umem); 2736 + SET_DEVICE_OP(dev_ops, create_user_cq); 2738 2737 SET_DEVICE_OP(dev_ops, create_flow); 2739 2738 SET_DEVICE_OP(dev_ops, create_qp); 2740 2739 SET_DEVICE_OP(dev_ops, create_rwq_ind_table); ··· 2783 2782 SET_DEVICE_OP(dev_ops, get_netdev); 2784 2783 SET_DEVICE_OP(dev_ops, get_numa_node); 2785 2784 SET_DEVICE_OP(dev_ops, get_port_immutable); 2786 - SET_DEVICE_OP(dev_ops, get_vector_affinity); 2787 2785 SET_DEVICE_OP(dev_ops, get_vf_config); 2788 2786 SET_DEVICE_OP(dev_ops, get_vf_guid); 2789 2787 SET_DEVICE_OP(dev_ops, get_vf_stats); ··· 2833 2833 SET_DEVICE_OP(dev_ops, reg_user_mr_dmabuf); 2834 2834 SET_DEVICE_OP(dev_ops, req_notify_cq); 2835 2835 SET_DEVICE_OP(dev_ops, rereg_user_mr); 2836 - SET_DEVICE_OP(dev_ops, resize_cq); 2836 + SET_DEVICE_OP(dev_ops, resize_user_cq); 2837 2837 SET_DEVICE_OP(dev_ops, set_vf_guid); 2838 2838 SET_DEVICE_OP(dev_ops, set_vf_link_state); 2839 2839 SET_DEVICE_OP(dev_ops, ufile_hw_cleanup);

+547

drivers/infiniband/core/frmr_pools.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + /* 3 + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 4 + */ 5 + 6 + #include <linux/slab.h> 7 + #include <linux/rbtree.h> 8 + #include <linux/sort.h> 9 + #include <linux/spinlock.h> 10 + #include <rdma/ib_verbs.h> 11 + #include <linux/timer.h> 12 + 13 + #include "frmr_pools.h" 14 + 15 + #define FRMR_POOLS_DEFAULT_AGING_PERIOD_SECS 60 16 + 17 + static int push_handle_to_queue_locked(struct frmr_queue *queue, u32 handle) 18 + { 19 + u32 tmp = queue->ci % NUM_HANDLES_PER_PAGE; 20 + struct frmr_handles_page *page; 21 + 22 + if (queue->ci >= queue->num_pages * NUM_HANDLES_PER_PAGE) { 23 + page = kzalloc_obj(*page, GFP_ATOMIC); 24 + if (!page) 25 + return -ENOMEM; 26 + queue->num_pages++; 27 + list_add_tail(&page->list, &queue->pages_list); 28 + } else { 29 + page = list_last_entry(&queue->pages_list, 30 + struct frmr_handles_page, list); 31 + } 32 + 33 + page->handles[tmp] = handle; 34 + queue->ci++; 35 + return 0; 36 + } 37 + 38 + static u32 pop_handle_from_queue_locked(struct frmr_queue *queue) 39 + { 40 + u32 tmp = (queue->ci - 1) % NUM_HANDLES_PER_PAGE; 41 + struct frmr_handles_page *page; 42 + u32 handle; 43 + 44 + page = list_last_entry(&queue->pages_list, struct frmr_handles_page, 45 + list); 46 + handle = page->handles[tmp]; 47 + queue->ci--; 48 + 49 + if (!tmp) { 50 + list_del(&page->list); 51 + queue->num_pages--; 52 + kfree(page); 53 + } 54 + 55 + return handle; 56 + } 57 + 58 + static bool pop_frmr_handles_page(struct ib_frmr_pool *pool, 59 + struct frmr_queue *queue, 60 + struct frmr_handles_page **page, u32 *count) 61 + { 62 + spin_lock(&pool->lock); 63 + if (list_empty(&queue->pages_list)) { 64 + spin_unlock(&pool->lock); 65 + return false; 66 + } 67 + 68 + *page = list_first_entry(&queue->pages_list, struct frmr_handles_page, 69 + list); 70 + list_del(&(*page)->list); 71 + queue->num_pages--; 72 + 73 + /* If this is the last page, count may be less than 74 + * NUM_HANDLES_PER_PAGE. 75 + */ 76 + if (queue->ci >= NUM_HANDLES_PER_PAGE) 77 + *count = NUM_HANDLES_PER_PAGE; 78 + else 79 + *count = queue->ci; 80 + 81 + queue->ci -= *count; 82 + spin_unlock(&pool->lock); 83 + return true; 84 + } 85 + 86 + static void destroy_all_handles_in_queue(struct ib_device *device, 87 + struct ib_frmr_pool *pool, 88 + struct frmr_queue *queue) 89 + { 90 + struct ib_frmr_pools *pools = device->frmr_pools; 91 + struct frmr_handles_page *page; 92 + u32 count; 93 + 94 + while (pop_frmr_handles_page(pool, queue, &page, &count)) { 95 + pools->pool_ops->destroy_frmrs(device, page->handles, count); 96 + kfree(page); 97 + } 98 + } 99 + 100 + static bool age_pinned_pool(struct ib_device *device, struct ib_frmr_pool *pool) 101 + { 102 + struct ib_frmr_pools *pools = device->frmr_pools; 103 + u32 total, to_destroy, destroyed = 0; 104 + bool has_work = false; 105 + u32 *handles; 106 + u32 handle; 107 + 108 + spin_lock(&pool->lock); 109 + total = pool->queue.ci + pool->inactive_queue.ci + pool->in_use; 110 + if (total <= pool->pinned_handles) { 111 + spin_unlock(&pool->lock); 112 + return false; 113 + } 114 + 115 + to_destroy = total - pool->pinned_handles; 116 + 117 + handles = kcalloc(to_destroy, sizeof(*handles), GFP_ATOMIC); 118 + if (!handles) { 119 + spin_unlock(&pool->lock); 120 + return true; 121 + } 122 + 123 + /* Destroy all excess handles in the inactive queue */ 124 + while (pool->inactive_queue.ci && destroyed < to_destroy) { 125 + handles[destroyed++] = pop_handle_from_queue_locked( 126 + &pool->inactive_queue); 127 + } 128 + 129 + /* Move all handles from regular queue to inactive queue */ 130 + while (pool->queue.ci) { 131 + handle = pop_handle_from_queue_locked(&pool->queue); 132 + push_handle_to_queue_locked(&pool->inactive_queue, handle); 133 + has_work = true; 134 + } 135 + 136 + spin_unlock(&pool->lock); 137 + 138 + if (destroyed) 139 + pools->pool_ops->destroy_frmrs(device, handles, destroyed); 140 + kfree(handles); 141 + return has_work; 142 + } 143 + 144 + static void pool_aging_work(struct work_struct *work) 145 + { 146 + struct ib_frmr_pool *pool = container_of( 147 + to_delayed_work(work), struct ib_frmr_pool, aging_work); 148 + struct ib_frmr_pools *pools = pool->device->frmr_pools; 149 + bool has_work = false; 150 + 151 + if (pool->pinned_handles) { 152 + has_work = age_pinned_pool(pool->device, pool); 153 + goto out; 154 + } 155 + 156 + destroy_all_handles_in_queue(pool->device, pool, &pool->inactive_queue); 157 + 158 + /* Move all pages from regular queue to inactive queue */ 159 + spin_lock(&pool->lock); 160 + if (pool->queue.ci > 0) { 161 + list_splice_tail_init(&pool->queue.pages_list, 162 + &pool->inactive_queue.pages_list); 163 + pool->inactive_queue.num_pages = pool->queue.num_pages; 164 + pool->inactive_queue.ci = pool->queue.ci; 165 + 166 + pool->queue.num_pages = 0; 167 + pool->queue.ci = 0; 168 + has_work = true; 169 + } 170 + spin_unlock(&pool->lock); 171 + 172 + out: 173 + /* Reschedule if there are handles to age in next aging period */ 174 + if (has_work) 175 + queue_delayed_work( 176 + pools->aging_wq, &pool->aging_work, 177 + secs_to_jiffies(READ_ONCE(pools->aging_period_sec))); 178 + } 179 + 180 + static void destroy_frmr_pool(struct ib_device *device, 181 + struct ib_frmr_pool *pool) 182 + { 183 + cancel_delayed_work_sync(&pool->aging_work); 184 + destroy_all_handles_in_queue(device, pool, &pool->queue); 185 + destroy_all_handles_in_queue(device, pool, &pool->inactive_queue); 186 + 187 + kfree(pool); 188 + } 189 + 190 + /* 191 + * Initialize the FRMR pools for a device. 192 + * 193 + * @device: The device to initialize the FRMR pools for. 194 + * @pool_ops: The pool operations to use. 195 + * 196 + * Returns 0 on success, negative error code on failure. 197 + */ 198 + int ib_frmr_pools_init(struct ib_device *device, 199 + const struct ib_frmr_pool_ops *pool_ops) 200 + { 201 + struct ib_frmr_pools *pools; 202 + 203 + pools = kzalloc_obj(*pools); 204 + if (!pools) 205 + return -ENOMEM; 206 + 207 + pools->rb_root = RB_ROOT; 208 + rwlock_init(&pools->rb_lock); 209 + pools->pool_ops = pool_ops; 210 + pools->aging_wq = create_singlethread_workqueue("frmr_aging_wq"); 211 + if (!pools->aging_wq) { 212 + kfree(pools); 213 + return -ENOMEM; 214 + } 215 + 216 + pools->aging_period_sec = FRMR_POOLS_DEFAULT_AGING_PERIOD_SECS; 217 + 218 + device->frmr_pools = pools; 219 + return 0; 220 + } 221 + EXPORT_SYMBOL(ib_frmr_pools_init); 222 + 223 + /* 224 + * Clean up the FRMR pools for a device. 225 + * 226 + * @device: The device to clean up the FRMR pools for. 227 + * 228 + * Call cleanup only after all FRMR handles have been pushed back to the pool 229 + * and no other FRMR operations are allowed to run in parallel. 230 + * Ensuring this allows us to save synchronization overhead in pop and push 231 + * operations. 232 + */ 233 + void ib_frmr_pools_cleanup(struct ib_device *device) 234 + { 235 + struct ib_frmr_pools *pools = device->frmr_pools; 236 + struct ib_frmr_pool *pool, *next; 237 + 238 + if (!pools) 239 + return; 240 + 241 + rbtree_postorder_for_each_entry_safe(pool, next, &pools->rb_root, node) 242 + destroy_frmr_pool(device, pool); 243 + 244 + destroy_workqueue(pools->aging_wq); 245 + kfree(pools); 246 + device->frmr_pools = NULL; 247 + } 248 + EXPORT_SYMBOL(ib_frmr_pools_cleanup); 249 + 250 + int ib_frmr_pools_set_aging_period(struct ib_device *device, u32 period_sec) 251 + { 252 + struct ib_frmr_pools *pools = device->frmr_pools; 253 + struct ib_frmr_pool *pool; 254 + struct rb_node *node; 255 + 256 + if (!pools) 257 + return -EINVAL; 258 + 259 + if (period_sec == 0) 260 + return -EINVAL; 261 + 262 + WRITE_ONCE(pools->aging_period_sec, period_sec); 263 + 264 + read_lock(&pools->rb_lock); 265 + for (node = rb_first(&pools->rb_root); node; node = rb_next(node)) { 266 + pool = rb_entry(node, struct ib_frmr_pool, node); 267 + mod_delayed_work(pools->aging_wq, &pool->aging_work, 268 + secs_to_jiffies(period_sec)); 269 + } 270 + read_unlock(&pools->rb_lock); 271 + 272 + return 0; 273 + } 274 + 275 + static inline int compare_keys(struct ib_frmr_key *key1, 276 + struct ib_frmr_key *key2) 277 + { 278 + int res; 279 + 280 + res = cmp_int(key1->ats, key2->ats); 281 + if (res) 282 + return res; 283 + 284 + res = cmp_int(key1->access_flags, key2->access_flags); 285 + if (res) 286 + return res; 287 + 288 + res = cmp_int(key1->vendor_key, key2->vendor_key); 289 + if (res) 290 + return res; 291 + 292 + res = cmp_int(key1->kernel_vendor_key, key2->kernel_vendor_key); 293 + if (res) 294 + return res; 295 + 296 + /* 297 + * allow using handles that support more DMA blocks, up to twice the 298 + * requested number 299 + */ 300 + res = cmp_int(key1->num_dma_blocks, key2->num_dma_blocks); 301 + if (res > 0) { 302 + if (key1->num_dma_blocks - key2->num_dma_blocks < 303 + key2->num_dma_blocks) 304 + return 0; 305 + } 306 + 307 + return res; 308 + } 309 + 310 + static int frmr_pool_cmp_find(const void *key, const struct rb_node *node) 311 + { 312 + struct ib_frmr_pool *pool = rb_entry(node, struct ib_frmr_pool, node); 313 + 314 + return compare_keys(&pool->key, (struct ib_frmr_key *)key); 315 + } 316 + 317 + static int frmr_pool_cmp_add(struct rb_node *new, const struct rb_node *node) 318 + { 319 + struct ib_frmr_pool *new_pool = 320 + rb_entry(new, struct ib_frmr_pool, node); 321 + struct ib_frmr_pool *pool = rb_entry(node, struct ib_frmr_pool, node); 322 + 323 + return compare_keys(&pool->key, &new_pool->key); 324 + } 325 + 326 + static struct ib_frmr_pool *ib_frmr_pool_find(struct ib_frmr_pools *pools, 327 + struct ib_frmr_key *key) 328 + { 329 + struct ib_frmr_pool *pool; 330 + struct rb_node *node; 331 + 332 + /* find operation is done under read lock for performance reasons. 333 + * The case of threads failing to find the same pool and creating it 334 + * is handled by the create_frmr_pool function. 335 + */ 336 + read_lock(&pools->rb_lock); 337 + node = rb_find(key, &pools->rb_root, frmr_pool_cmp_find); 338 + pool = rb_entry_safe(node, struct ib_frmr_pool, node); 339 + read_unlock(&pools->rb_lock); 340 + 341 + return pool; 342 + } 343 + 344 + static struct ib_frmr_pool *create_frmr_pool(struct ib_device *device, 345 + struct ib_frmr_key *key) 346 + { 347 + struct ib_frmr_pools *pools = device->frmr_pools; 348 + struct ib_frmr_pool *pool; 349 + struct rb_node *existing; 350 + 351 + pool = kzalloc_obj(*pool); 352 + if (!pool) 353 + return ERR_PTR(-ENOMEM); 354 + 355 + memcpy(&pool->key, key, sizeof(*key)); 356 + INIT_LIST_HEAD(&pool->queue.pages_list); 357 + INIT_LIST_HEAD(&pool->inactive_queue.pages_list); 358 + spin_lock_init(&pool->lock); 359 + INIT_DELAYED_WORK(&pool->aging_work, pool_aging_work); 360 + pool->device = device; 361 + 362 + write_lock(&pools->rb_lock); 363 + existing = rb_find_add(&pool->node, &pools->rb_root, frmr_pool_cmp_add); 364 + write_unlock(&pools->rb_lock); 365 + 366 + /* If a different thread has already created the pool, return it. 367 + * The insert operation is done under the write lock so we are sure 368 + * that the pool is not inserted twice. 369 + */ 370 + if (existing) { 371 + kfree(pool); 372 + return rb_entry(existing, struct ib_frmr_pool, node); 373 + } 374 + 375 + return pool; 376 + } 377 + 378 + int ib_frmr_pools_set_pinned(struct ib_device *device, struct ib_frmr_key *key, 379 + u32 pinned_handles) 380 + { 381 + struct ib_frmr_pools *pools = device->frmr_pools; 382 + struct ib_frmr_key driver_key = {}; 383 + struct ib_frmr_pool *pool; 384 + u32 needed_handles; 385 + u32 current_total; 386 + int i, ret = 0; 387 + u32 *handles; 388 + 389 + if (!pools) 390 + return -EINVAL; 391 + 392 + ret = ib_check_mr_access(device, key->access_flags); 393 + if (ret) 394 + return ret; 395 + 396 + if (pools->pool_ops->build_key) { 397 + ret = pools->pool_ops->build_key(device, key, &driver_key); 398 + if (ret) 399 + return ret; 400 + } else { 401 + memcpy(&driver_key, key, sizeof(*key)); 402 + } 403 + 404 + pool = ib_frmr_pool_find(pools, &driver_key); 405 + if (!pool) { 406 + pool = create_frmr_pool(device, &driver_key); 407 + if (IS_ERR(pool)) 408 + return PTR_ERR(pool); 409 + } 410 + 411 + spin_lock(&pool->lock); 412 + current_total = pool->in_use + pool->queue.ci + pool->inactive_queue.ci; 413 + 414 + if (current_total < pinned_handles) 415 + needed_handles = pinned_handles - current_total; 416 + else 417 + needed_handles = 0; 418 + 419 + pool->pinned_handles = pinned_handles; 420 + spin_unlock(&pool->lock); 421 + 422 + if (!needed_handles) 423 + goto schedule_aging; 424 + 425 + handles = kcalloc(needed_handles, sizeof(*handles), GFP_KERNEL); 426 + if (!handles) 427 + return -ENOMEM; 428 + 429 + ret = pools->pool_ops->create_frmrs(device, key, handles, 430 + needed_handles); 431 + if (ret) { 432 + kfree(handles); 433 + return ret; 434 + } 435 + 436 + spin_lock(&pool->lock); 437 + for (i = 0; i < needed_handles; i++) { 438 + ret = push_handle_to_queue_locked(&pool->queue, 439 + handles[i]); 440 + if (ret) 441 + goto end; 442 + } 443 + 444 + end: 445 + spin_unlock(&pool->lock); 446 + kfree(handles); 447 + 448 + schedule_aging: 449 + /* Ensure aging is scheduled to adjust to new pinned handles count */ 450 + mod_delayed_work(pools->aging_wq, &pool->aging_work, 0); 451 + 452 + return ret; 453 + } 454 + 455 + static int get_frmr_from_pool(struct ib_device *device, 456 + struct ib_frmr_pool *pool, struct ib_mr *mr) 457 + { 458 + struct ib_frmr_pools *pools = device->frmr_pools; 459 + u32 handle; 460 + int err; 461 + 462 + spin_lock(&pool->lock); 463 + if (pool->queue.ci == 0) { 464 + if (pool->inactive_queue.ci > 0) { 465 + handle = pop_handle_from_queue_locked( 466 + &pool->inactive_queue); 467 + } else { 468 + spin_unlock(&pool->lock); 469 + err = pools->pool_ops->create_frmrs(device, &pool->key, 470 + &handle, 1); 471 + if (err) 472 + return err; 473 + spin_lock(&pool->lock); 474 + } 475 + } else { 476 + handle = pop_handle_from_queue_locked(&pool->queue); 477 + } 478 + 479 + pool->in_use++; 480 + if (pool->in_use > pool->max_in_use) 481 + pool->max_in_use = pool->in_use; 482 + 483 + spin_unlock(&pool->lock); 484 + 485 + mr->frmr.pool = pool; 486 + mr->frmr.handle = handle; 487 + 488 + return 0; 489 + } 490 + 491 + /* 492 + * Pop an FRMR handle from the pool. 493 + * 494 + * @device: The device to pop the FRMR handle from. 495 + * @mr: The MR to pop the FRMR handle from. 496 + * 497 + * Returns 0 on success, negative error code on failure. 498 + */ 499 + int ib_frmr_pool_pop(struct ib_device *device, struct ib_mr *mr) 500 + { 501 + struct ib_frmr_pools *pools = device->frmr_pools; 502 + struct ib_frmr_pool *pool; 503 + 504 + WARN_ON_ONCE(!device->frmr_pools); 505 + pool = ib_frmr_pool_find(pools, &mr->frmr.key); 506 + if (!pool) { 507 + pool = create_frmr_pool(device, &mr->frmr.key); 508 + if (IS_ERR(pool)) 509 + return PTR_ERR(pool); 510 + } 511 + 512 + return get_frmr_from_pool(device, pool, mr); 513 + } 514 + EXPORT_SYMBOL(ib_frmr_pool_pop); 515 + 516 + /* 517 + * Push an FRMR handle back to the pool. 518 + * 519 + * @device: The device to push the FRMR handle to. 520 + * @mr: The MR containing the FRMR handle to push back to the pool. 521 + * 522 + * Returns 0 on success, negative error code on failure. 523 + */ 524 + int ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr) 525 + { 526 + struct ib_frmr_pool *pool = mr->frmr.pool; 527 + struct ib_frmr_pools *pools = device->frmr_pools; 528 + bool schedule_aging = false; 529 + int ret; 530 + 531 + spin_lock(&pool->lock); 532 + /* Schedule aging every time an empty pool becomes non-empty */ 533 + if (pool->queue.ci == 0) 534 + schedule_aging = true; 535 + ret = push_handle_to_queue_locked(&pool->queue, mr->frmr.handle); 536 + if (ret == 0) 537 + pool->in_use--; 538 + 539 + spin_unlock(&pool->lock); 540 + 541 + if (ret == 0 && schedule_aging) 542 + queue_delayed_work(pools->aging_wq, &pool->aging_work, 543 + secs_to_jiffies(READ_ONCE(pools->aging_period_sec))); 544 + 545 + return ret; 546 + } 547 + EXPORT_SYMBOL(ib_frmr_pool_push);

+63

drivers/infiniband/core/frmr_pools.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + * 3 + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 4 + */ 5 + 6 + #ifndef RDMA_CORE_FRMR_POOLS_H 7 + #define RDMA_CORE_FRMR_POOLS_H 8 + 9 + #include <rdma/frmr_pools.h> 10 + #include <linux/rbtree_types.h> 11 + #include <linux/spinlock_types.h> 12 + #include <linux/types.h> 13 + #include <asm/page.h> 14 + #include <linux/workqueue.h> 15 + 16 + #define NUM_HANDLES_PER_PAGE \ 17 + ((PAGE_SIZE - sizeof(struct list_head)) / sizeof(u32)) 18 + 19 + struct frmr_handles_page { 20 + struct list_head list; 21 + u32 handles[NUM_HANDLES_PER_PAGE]; 22 + }; 23 + 24 + /* FRMR queue holds a list of frmr_handles_page. 25 + * num_pages: number of pages in the queue. 26 + * ci: current index in the handles array across all pages. 27 + */ 28 + struct frmr_queue { 29 + struct list_head pages_list; 30 + u32 num_pages; 31 + unsigned long ci; 32 + }; 33 + 34 + struct ib_frmr_pool { 35 + struct rb_node node; 36 + struct ib_frmr_key key; /* Pool key */ 37 + 38 + /* Protect access to the queue */ 39 + spinlock_t lock; 40 + struct frmr_queue queue; 41 + struct frmr_queue inactive_queue; 42 + 43 + struct delayed_work aging_work; 44 + struct ib_device *device; 45 + 46 + u32 max_in_use; 47 + u32 in_use; 48 + u32 pinned_handles; 49 + }; 50 + 51 + struct ib_frmr_pools { 52 + struct rb_root rb_root; 53 + rwlock_t rb_lock; 54 + const struct ib_frmr_pool_ops *pool_ops; 55 + 56 + struct workqueue_struct *aging_wq; 57 + u32 aging_period_sec; 58 + }; 59 + 60 + int ib_frmr_pools_set_pinned(struct ib_device *device, struct ib_frmr_key *key, 61 + u32 pinned_handles); 62 + int ib_frmr_pools_set_aging_period(struct ib_device *device, u32 period_sec); 63 + #endif /* RDMA_CORE_FRMR_POOLS_H */

+27

drivers/infiniband/core/ib_core_uverbs.c

··· 389 389 U32_MAX); 390 390 } 391 391 EXPORT_SYMBOL(rdma_user_mmap_entry_insert); 392 + 393 + /** 394 + * rdma_udata_to_dev - Get a ib_device from a udata 395 + * @udata: The system calls ib_udata struct 396 + * 397 + * The struct ib_device that is handling the uverbs call. Must not be called if 398 + * udata is NULL. The result can be NULL. 399 + */ 400 + struct ib_device *rdma_udata_to_dev(struct ib_udata *udata) 401 + { 402 + struct uverbs_attr_bundle *bundle = 403 + rdma_udata_to_uverbs_attr_bundle(udata); 404 + 405 + lockdep_assert_held(&bundle->ufile->device->disassociate_srcu); 406 + 407 + if (bundle->context) 408 + return bundle->context->device; 409 + 410 + /* 411 + * If the context hasn't been created yet use the ufile's dev, but it 412 + * might be NULL if we are racing with disassociate. 413 + */ 414 + return srcu_dereference(bundle->ufile->device->ib_dev, 415 + &bundle->ufile->device->disassociate_srcu); 416 + } 417 + EXPORT_SYMBOL(rdma_udata_to_dev); 418 +

+43

drivers/infiniband/core/iter.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + /* Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. */ 3 + 4 + #include <linux/export.h> 5 + #include <rdma/iter.h> 6 + 7 + void __rdma_block_iter_start(struct ib_block_iter *biter, 8 + struct scatterlist *sglist, unsigned int nents, 9 + unsigned long pgsz) 10 + { 11 + memset(biter, 0, sizeof(struct ib_block_iter)); 12 + biter->__sg = sglist; 13 + biter->__sg_nents = nents; 14 + 15 + /* Driver provides best block size to use */ 16 + biter->__pg_bit = __fls(pgsz); 17 + } 18 + EXPORT_SYMBOL(__rdma_block_iter_start); 19 + 20 + bool __rdma_block_iter_next(struct ib_block_iter *biter) 21 + { 22 + unsigned int block_offset; 23 + unsigned int delta; 24 + 25 + if (!biter->__sg_nents || !biter->__sg) 26 + return false; 27 + 28 + biter->__dma_addr = sg_dma_address(biter->__sg) + biter->__sg_advance; 29 + block_offset = biter->__dma_addr & (BIT_ULL(biter->__pg_bit) - 1); 30 + delta = BIT_ULL(biter->__pg_bit) - block_offset; 31 + 32 + while (biter->__sg_nents && biter->__sg && 33 + sg_dma_len(biter->__sg) - biter->__sg_advance <= delta) { 34 + delta -= sg_dma_len(biter->__sg) - biter->__sg_advance; 35 + biter->__sg_advance = 0; 36 + biter->__sg = sg_next(biter->__sg); 37 + biter->__sg_nents--; 38 + } 39 + biter->__sg_advance += delta; 40 + 41 + return true; 42 + } 43 + EXPORT_SYMBOL(__rdma_block_iter_next);

+3 -3

drivers/infiniband/core/iwpm_msg.c

··· 365 365 /* netlink attribute policy for the received response to register pid request */ 366 366 static const struct nla_policy resp_reg_policy[IWPM_NLA_RREG_PID_MAX] = { 367 367 [IWPM_NLA_RREG_PID_SEQ] = { .type = NLA_U32 }, 368 - [IWPM_NLA_RREG_IBDEV_NAME] = { .type = NLA_STRING, 368 + [IWPM_NLA_RREG_IBDEV_NAME] = { .type = NLA_NUL_STRING, 369 369 .len = IWPM_DEVNAME_SIZE - 1 }, 370 - [IWPM_NLA_RREG_ULIB_NAME] = { .type = NLA_STRING, 370 + [IWPM_NLA_RREG_ULIB_NAME] = { .type = NLA_NUL_STRING, 371 371 .len = IWPM_ULIBNAME_SIZE - 1 }, 372 372 [IWPM_NLA_RREG_ULIB_VER] = { .type = NLA_U16 }, 373 373 [IWPM_NLA_RREG_PID_ERR] = { .type = NLA_U16 } ··· 677 677 678 678 /* netlink attribute policy for the received request for mapping info */ 679 679 static const struct nla_policy resp_mapinfo_policy[IWPM_NLA_MAPINFO_REQ_MAX] = { 680 - [IWPM_NLA_MAPINFO_ULIB_NAME] = { .type = NLA_STRING, 680 + [IWPM_NLA_MAPINFO_ULIB_NAME] = { .type = NLA_NUL_STRING, 681 681 .len = IWPM_ULIBNAME_SIZE - 1 }, 682 682 [IWPM_NLA_MAPINFO_ULIB_VER] = { .type = NLA_U16 } 683 683 };

+298

drivers/infiniband/core/nldev.c

··· 37 37 #include <net/netlink.h> 38 38 #include <rdma/rdma_cm.h> 39 39 #include <rdma/rdma_netlink.h> 40 + #include <rdma/frmr_pools.h> 40 41 41 42 #include "core_priv.h" 42 43 #include "cma_priv.h" 43 44 #include "restrack.h" 44 45 #include "uverbs.h" 46 + #include "frmr_pools.h" 45 47 46 48 /* 47 49 * This determines whether a non-privileged user is allowed to specify a ··· 174 172 [RDMA_NLDEV_ATTR_NAME_ASSIGN_TYPE] = { .type = NLA_U8 }, 175 173 [RDMA_NLDEV_ATTR_EVENT_TYPE] = { .type = NLA_U8 }, 176 174 [RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED] = { .type = NLA_U8 }, 175 + [RDMA_NLDEV_ATTR_FRMR_POOLS] = { .type = NLA_NESTED }, 176 + [RDMA_NLDEV_ATTR_FRMR_POOL_ENTRY] = { .type = NLA_NESTED }, 177 + [RDMA_NLDEV_ATTR_FRMR_POOL_KEY] = { .type = NLA_NESTED }, 178 + [RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS] = { .type = NLA_U8 }, 179 + [RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS] = { .type = NLA_U32 }, 180 + [RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY] = { .type = NLA_U64 }, 181 + [RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS] = { .type = NLA_U64 }, 182 + [RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES] = { .type = NLA_U32 }, 183 + [RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE] = { .type = NLA_U64 }, 184 + [RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE] = { .type = NLA_U64 }, 185 + [RDMA_NLDEV_ATTR_FRMR_POOLS_AGING_PERIOD] = { .type = NLA_U32 }, 186 + [RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES] = { .type = NLA_U32 }, 187 + [RDMA_NLDEV_ATTR_FRMR_POOL_KEY_KERNEL_VENDOR_KEY] = { .type = NLA_U64 }, 177 188 }; 178 189 179 190 static int put_driver_name_print_type(struct sk_buff *msg, const char *name, ··· 1839 1824 return -EINVAL; 1840 1825 } 1841 1826 1827 + /* 1828 + * This path is triggered by the 'rdma link delete' administrative command. 1829 + * For Soft-RoCE (RXE), we ensure that transport sockets are closed here. 1830 + * Note: iWARP driver does not implement .dellink, so this logic is 1831 + * implicitly scoped to the driver supporting dynamic link deletion like RXE. 1832 + */ 1833 + if (device->link_ops && device->link_ops->dellink) { 1834 + err = device->link_ops->dellink(device); 1835 + if (err) 1836 + return err; 1837 + } 1838 + 1842 1839 ib_unregister_device_and_put(device); 1843 1840 return 0; 1844 1841 } ··· 2664 2637 return ib_del_sub_device_and_put(device); 2665 2638 } 2666 2639 2640 + static int fill_frmr_pool_key(struct sk_buff *msg, struct ib_frmr_key *key) 2641 + { 2642 + struct nlattr *key_attr; 2643 + 2644 + key_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_FRMR_POOL_KEY); 2645 + if (!key_attr) 2646 + return -EMSGSIZE; 2647 + 2648 + if (nla_put_u8(msg, RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS, key->ats)) 2649 + goto err; 2650 + if (nla_put_u32(msg, RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS, 2651 + key->access_flags)) 2652 + goto err; 2653 + if (nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY, 2654 + key->vendor_key, RDMA_NLDEV_ATTR_PAD)) 2655 + goto err; 2656 + if (nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS, 2657 + key->num_dma_blocks, RDMA_NLDEV_ATTR_PAD)) 2658 + goto err; 2659 + 2660 + if (key->kernel_vendor_key && 2661 + nla_put_u64_64bit(msg, 2662 + RDMA_NLDEV_ATTR_FRMR_POOL_KEY_KERNEL_VENDOR_KEY, 2663 + key->kernel_vendor_key, RDMA_NLDEV_ATTR_PAD)) 2664 + goto err; 2665 + 2666 + nla_nest_end(msg, key_attr); 2667 + return 0; 2668 + 2669 + err: 2670 + return -EMSGSIZE; 2671 + } 2672 + 2673 + static int fill_frmr_pool_entry(struct sk_buff *msg, struct ib_frmr_pool *pool) 2674 + { 2675 + if (fill_frmr_pool_key(msg, &pool->key)) 2676 + return -EMSGSIZE; 2677 + 2678 + spin_lock(&pool->lock); 2679 + if (nla_put_u32(msg, RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES, 2680 + pool->queue.ci + pool->inactive_queue.ci)) 2681 + goto err_unlock; 2682 + if (nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE, 2683 + pool->max_in_use, RDMA_NLDEV_ATTR_PAD)) 2684 + goto err_unlock; 2685 + if (nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE, 2686 + pool->in_use, RDMA_NLDEV_ATTR_PAD)) 2687 + goto err_unlock; 2688 + if (nla_put_u32(msg, RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES, 2689 + pool->pinned_handles)) 2690 + goto err_unlock; 2691 + spin_unlock(&pool->lock); 2692 + 2693 + return 0; 2694 + 2695 + err_unlock: 2696 + spin_unlock(&pool->lock); 2697 + return -EMSGSIZE; 2698 + } 2699 + 2700 + static int nldev_frmr_pools_parse_key(struct nlattr *tb[], 2701 + struct ib_frmr_key *key, 2702 + struct netlink_ext_ack *extack) 2703 + { 2704 + if (tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS]) 2705 + key->ats = nla_get_u8(tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS]); 2706 + 2707 + if (tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS]) 2708 + key->access_flags = nla_get_u32( 2709 + tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS]); 2710 + 2711 + if (tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY]) 2712 + key->vendor_key = nla_get_u64( 2713 + tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY]); 2714 + 2715 + if (tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS]) 2716 + key->num_dma_blocks = nla_get_u64( 2717 + tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS]); 2718 + 2719 + if (tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_KERNEL_VENDOR_KEY]) 2720 + return -EINVAL; 2721 + 2722 + return 0; 2723 + } 2724 + 2725 + static int nldev_frmr_pools_set_pinned(struct ib_device *device, 2726 + struct nlattr *tb[], 2727 + struct netlink_ext_ack *extack) 2728 + { 2729 + struct nlattr *key_tb[RDMA_NLDEV_ATTR_MAX]; 2730 + struct ib_frmr_key key = { 0 }; 2731 + u32 pinned_handles = 0; 2732 + int err = 0; 2733 + 2734 + pinned_handles = 2735 + nla_get_u32(tb[RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES]); 2736 + 2737 + if (!tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY]) 2738 + return -EINVAL; 2739 + 2740 + err = nla_parse_nested(key_tb, RDMA_NLDEV_ATTR_MAX - 1, 2741 + tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY], nldev_policy, 2742 + extack); 2743 + if (err) 2744 + return err; 2745 + 2746 + err = nldev_frmr_pools_parse_key(key_tb, &key, extack); 2747 + if (err) 2748 + return err; 2749 + 2750 + err = ib_frmr_pools_set_pinned(device, &key, pinned_handles); 2751 + 2752 + return err; 2753 + } 2754 + 2755 + static int nldev_frmr_pools_get_dumpit(struct sk_buff *skb, 2756 + struct netlink_callback *cb) 2757 + { 2758 + struct nlattr *tb[RDMA_NLDEV_ATTR_MAX]; 2759 + struct ib_frmr_pools *pools; 2760 + int err, ret = 0, idx = 0; 2761 + struct ib_frmr_pool *pool; 2762 + struct nlattr *table_attr; 2763 + struct nlattr *entry_attr; 2764 + bool show_details = false; 2765 + struct ib_device *device; 2766 + int start = cb->args[0]; 2767 + struct rb_node *node; 2768 + struct nlmsghdr *nlh; 2769 + bool filled = false; 2770 + 2771 + err = __nlmsg_parse(cb->nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1, 2772 + nldev_policy, NL_VALIDATE_LIBERAL, NULL); 2773 + if (err || !tb[RDMA_NLDEV_ATTR_DEV_INDEX]) 2774 + return -EINVAL; 2775 + 2776 + device = ib_device_get_by_index( 2777 + sock_net(skb->sk), nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX])); 2778 + if (!device) 2779 + return -EINVAL; 2780 + 2781 + if (tb[RDMA_NLDEV_ATTR_DRIVER_DETAILS]) 2782 + show_details = nla_get_u8(tb[RDMA_NLDEV_ATTR_DRIVER_DETAILS]); 2783 + 2784 + pools = device->frmr_pools; 2785 + if (!pools) { 2786 + ib_device_put(device); 2787 + return 0; 2788 + } 2789 + 2790 + nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, 2791 + RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, 2792 + RDMA_NLDEV_CMD_FRMR_POOLS_GET), 2793 + 0, NLM_F_MULTI); 2794 + 2795 + if (!nlh || fill_nldev_handle(skb, device)) { 2796 + ret = -EMSGSIZE; 2797 + goto err; 2798 + } 2799 + 2800 + table_attr = nla_nest_start_noflag(skb, RDMA_NLDEV_ATTR_FRMR_POOLS); 2801 + if (!table_attr) { 2802 + ret = -EMSGSIZE; 2803 + goto err; 2804 + } 2805 + 2806 + read_lock(&pools->rb_lock); 2807 + for (node = rb_first(&pools->rb_root); node; node = rb_next(node)) { 2808 + pool = rb_entry(node, struct ib_frmr_pool, node); 2809 + if (pool->key.kernel_vendor_key && !show_details) 2810 + continue; 2811 + 2812 + if (idx < start) { 2813 + idx++; 2814 + continue; 2815 + } 2816 + 2817 + filled = true; 2818 + 2819 + entry_attr = nla_nest_start_noflag( 2820 + skb, RDMA_NLDEV_ATTR_FRMR_POOL_ENTRY); 2821 + if (!entry_attr) { 2822 + ret = -EMSGSIZE; 2823 + goto end_msg; 2824 + } 2825 + 2826 + if (fill_frmr_pool_entry(skb, pool)) { 2827 + nla_nest_cancel(skb, entry_attr); 2828 + ret = -EMSGSIZE; 2829 + goto end_msg; 2830 + } 2831 + 2832 + nla_nest_end(skb, entry_attr); 2833 + idx++; 2834 + } 2835 + end_msg: 2836 + read_unlock(&pools->rb_lock); 2837 + 2838 + nla_nest_end(skb, table_attr); 2839 + nlmsg_end(skb, nlh); 2840 + cb->args[0] = idx; 2841 + 2842 + /* 2843 + * No more entries to fill, cancel the message and 2844 + * return 0 to mark end of dumpit. 2845 + */ 2846 + if (!filled) 2847 + goto err; 2848 + 2849 + ib_device_put(device); 2850 + return skb->len; 2851 + 2852 + err: 2853 + nlmsg_cancel(skb, nlh); 2854 + ib_device_put(device); 2855 + return ret; 2856 + } 2857 + 2858 + static int nldev_frmr_pools_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh, 2859 + struct netlink_ext_ack *extack) 2860 + { 2861 + struct ib_device *device; 2862 + struct nlattr **tb; 2863 + u32 aging_period; 2864 + int err; 2865 + 2866 + tb = kzalloc_objs(*tb, RDMA_NLDEV_ATTR_MAX, GFP_KERNEL); 2867 + if (!tb) 2868 + return -ENOMEM; 2869 + 2870 + err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1, nldev_policy, 2871 + extack); 2872 + if (err) 2873 + goto free_tb; 2874 + 2875 + if (!tb[RDMA_NLDEV_ATTR_DEV_INDEX]) { 2876 + err = -EINVAL; 2877 + goto free_tb; 2878 + } 2879 + 2880 + device = ib_device_get_by_index( 2881 + sock_net(skb->sk), nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX])); 2882 + if (!device) { 2883 + err = -EINVAL; 2884 + goto free_tb; 2885 + } 2886 + 2887 + if (tb[RDMA_NLDEV_ATTR_FRMR_POOLS_AGING_PERIOD]) { 2888 + aging_period = nla_get_u32( 2889 + tb[RDMA_NLDEV_ATTR_FRMR_POOLS_AGING_PERIOD]); 2890 + err = ib_frmr_pools_set_aging_period(device, aging_period); 2891 + goto done; 2892 + } 2893 + 2894 + if (tb[RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES]) 2895 + err = nldev_frmr_pools_set_pinned(device, tb, extack); 2896 + 2897 + done: 2898 + ib_device_put(device); 2899 + free_tb: 2900 + kfree(tb); 2901 + return err; 2902 + } 2903 + 2667 2904 static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = { 2668 2905 [RDMA_NLDEV_CMD_GET] = { 2669 2906 .doit = nldev_get_doit, ··· 3032 2741 }, 3033 2742 [RDMA_NLDEV_CMD_DELDEV] = { 3034 2743 .doit = nldev_deldev, 2744 + .flags = RDMA_NL_ADMIN_PERM, 2745 + }, 2746 + [RDMA_NLDEV_CMD_FRMR_POOLS_GET] = { 2747 + .dump = nldev_frmr_pools_get_dumpit, 2748 + }, 2749 + [RDMA_NLDEV_CMD_FRMR_POOLS_SET] = { 2750 + .doit = nldev_frmr_pools_set_doit, 3035 2751 .flags = RDMA_NL_ADMIN_PERM, 3036 2752 }, 3037 2753 };

+2 -2

drivers/infiniband/core/rdma_core.c

··· 590 590 void *old; 591 591 592 592 /* 593 - * We already allocated this IDR with a NULL object, so 593 + * We already allocated this XArray entry with a NULL pointer, so 594 594 * this shouldn't fail. 595 595 * 596 596 * NOTE: Storing the uobj transfers our kref on uobj to the XArray. 597 - * It will be put by remove_commit_idr_uobject() 597 + * It will be put by remove_handle_idr_uobject() 598 598 */ 599 599 old = xa_store(&ufile->idr, uobj->id, uobj, GFP_KERNEL); 600 600 WARN_ON(old != NULL);

+3

drivers/infiniband/core/rdma_core.h

··· 151 151 unsigned int num_attrs); 152 152 void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile); 153 153 154 + typedef int (*uverbs_api_ioctl_handler_fn)(struct uverbs_attr_bundle *attrs); 155 + uverbs_api_ioctl_handler_fn uverbs_get_handler_fn(struct ib_udata *udata); 156 + 154 157 extern const struct uapi_definition uverbs_def_obj_async_fd[]; 155 158 extern const struct uapi_definition uverbs_def_obj_counters[]; 156 159 extern const struct uapi_definition uverbs_def_obj_cq[];

+7 -8

drivers/infiniband/core/umem.c

··· 55 55 56 56 if (dirty) 57 57 ib_dma_unmap_sgtable_attrs(dev, &umem->sgt_append.sgt, 58 - DMA_BIDIRECTIONAL, 59 - DMA_ATTR_REQUIRE_COHERENT); 58 + DMA_BIDIRECTIONAL, umem->dma_attrs); 60 59 61 60 for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) { 62 61 unpin_user_page_range_dirty_lock(sg_page(sg), ··· 169 170 unsigned long lock_limit; 170 171 unsigned long new_pinned; 171 172 unsigned long cur_base; 172 - unsigned long dma_attr = DMA_ATTR_REQUIRE_COHERENT; 173 173 struct mm_struct *mm; 174 174 unsigned long npages; 175 175 int pinned, ret; ··· 201 203 umem->iova = addr; 202 204 umem->writable = ib_access_writable(access); 203 205 umem->owning_mm = mm = current->mm; 206 + umem->dma_attrs = DMA_ATTR_REQUIRE_COHERENT; 207 + if (access & IB_ACCESS_RELAXED_ORDERING) 208 + umem->dma_attrs |= DMA_ATTR_WEAK_ORDERING; 209 + 204 210 mmgrab(mm); 205 211 206 212 page_list = (struct page **) __get_free_page(GFP_KERNEL); ··· 257 255 } 258 256 } 259 257 260 - if (access & IB_ACCESS_RELAXED_ORDERING) 261 - dma_attr |= DMA_ATTR_WEAK_ORDERING; 262 - 263 258 ret = ib_dma_map_sgtable_attrs(device, &umem->sgt_append.sgt, 264 - DMA_BIDIRECTIONAL, dma_attr); 259 + DMA_BIDIRECTIONAL, umem->dma_attrs); 265 260 if (ret) 266 261 goto umem_release; 267 262 goto out; ··· 283 284 */ 284 285 void ib_umem_release(struct ib_umem *umem) 285 286 { 286 - if (!umem) 287 + if (IS_ERR_OR_NULL(umem)) 287 288 return; 288 289 if (umem->is_dmabuf) 289 290 return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));

+120 -18

drivers/infiniband/core/umem_dmabuf.c

··· 185 185 .allow_peer2peer = true, 186 186 }; 187 187 188 - struct ib_umem_dmabuf * 189 - ib_umem_dmabuf_get_pinned_with_dma_device(struct ib_device *device, 190 - struct device *dma_device, 191 - unsigned long offset, size_t size, 192 - int fd, int access) 188 + static void ib_umem_dmabuf_revoke_locked(struct dma_buf_attachment *attach) 189 + { 190 + struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv; 191 + 192 + dma_resv_assert_held(attach->dmabuf->resv); 193 + 194 + if (umem_dmabuf->revoked) 195 + return; 196 + 197 + if (umem_dmabuf->pinned_revoke) 198 + umem_dmabuf->pinned_revoke(umem_dmabuf->private); 199 + 200 + ib_umem_dmabuf_unmap_pages(umem_dmabuf); 201 + if (umem_dmabuf->pinned) { 202 + dma_buf_unpin(umem_dmabuf->attach); 203 + umem_dmabuf->pinned = 0; 204 + } 205 + umem_dmabuf->revoked = 1; 206 + } 207 + 208 + static struct dma_buf_attach_ops ib_umem_dmabuf_attach_pinned_revocable_ops = { 209 + .allow_peer2peer = true, 210 + .invalidate_mappings = ib_umem_dmabuf_revoke_locked, 211 + }; 212 + 213 + static struct ib_umem_dmabuf * 214 + ib_umem_dmabuf_get_pinned_and_lock(struct ib_device *device, 215 + struct device *dma_device, 216 + unsigned long offset, 217 + size_t size, int fd, int access, 218 + const struct dma_buf_attach_ops *ops) 193 219 { 194 220 struct ib_umem_dmabuf *umem_dmabuf; 195 221 int err; 196 222 197 - umem_dmabuf = ib_umem_dmabuf_get_with_dma_device(device, dma_device, offset, 198 - size, fd, access, 199 - &ib_umem_dmabuf_attach_pinned_ops); 223 + umem_dmabuf = 224 + ib_umem_dmabuf_get_with_dma_device(device, dma_device, offset, 225 + size, fd, access, ops); 200 226 if (IS_ERR(umem_dmabuf)) 201 227 return umem_dmabuf; 202 228 ··· 235 209 err = ib_umem_dmabuf_map_pages(umem_dmabuf); 236 210 if (err) 237 211 goto err_release; 238 - dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv); 239 212 240 213 return umem_dmabuf; 241 214 ··· 243 218 ib_umem_release(&umem_dmabuf->umem); 244 219 return ERR_PTR(err); 245 220 } 221 + 222 + struct ib_umem_dmabuf * 223 + ib_umem_dmabuf_get_pinned_with_dma_device(struct ib_device *device, 224 + struct device *dma_device, 225 + unsigned long offset, size_t size, 226 + int fd, int access) 227 + { 228 + struct ib_umem_dmabuf *umem_dmabuf = 229 + ib_umem_dmabuf_get_pinned_and_lock(device, dma_device, offset, 230 + size, fd, access, 231 + &ib_umem_dmabuf_attach_pinned_ops); 232 + if (IS_ERR(umem_dmabuf)) 233 + return umem_dmabuf; 234 + 235 + dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv); 236 + return umem_dmabuf; 237 + } 246 238 EXPORT_SYMBOL(ib_umem_dmabuf_get_pinned_with_dma_device); 239 + 240 + /** 241 + * ib_umem_dmabuf_get_pinned_revocable_and_lock - Map & pin a revocable dmabuf 242 + * @device: IB device. 243 + * @offset: Start offset. 244 + * @size: Length. 245 + * @fd: dmabuf fd. 246 + * @access: Access flags. 247 + * 248 + * Obtains a umem from a dmabuf for drivers/devices that can support revocation. 249 + * 250 + * Returns with dma_resv_lock held upon success. The driver must set the revoke 251 + * callback prior to unlock by calling ib_umem_dmabuf_set_revoke_locked(). 252 + * 253 + * When a revocation occurs, the revoke callback will be called. The driver must 254 + * ensure that the region is no longer accessed when the callback returns. Any 255 + * subsequent access attempts should also probably cause an AE for MRs. 256 + * 257 + * If the umem is used for an MR, the driver must ensure that the key remains in 258 + * use such that it cannot be obtained by a new region until this region is 259 + * fully deregistered (i.e., ibv_dereg_mr). If a driver needs to serialize with 260 + * revoke calls, it can use dma_resv_lock. 261 + * 262 + * If successful, then the revoke callback may be called at any time and will 263 + * also be called automatically upon ib_umem_release (serialized). The revoke 264 + * callback will be called one time at most. 265 + * 266 + * Return: A pointer to ib_umem_dmabuf on success, or an ERR_PTR on failure. 267 + */ 268 + struct ib_umem_dmabuf * 269 + ib_umem_dmabuf_get_pinned_revocable_and_lock(struct ib_device *device, 270 + unsigned long offset, size_t size, 271 + int fd, int access) 272 + { 273 + const struct dma_buf_attach_ops *ops = 274 + &ib_umem_dmabuf_attach_pinned_revocable_ops; 275 + 276 + return ib_umem_dmabuf_get_pinned_and_lock(device, device->dma_device, 277 + offset, size, fd, access, 278 + ops); 279 + } 280 + EXPORT_SYMBOL(ib_umem_dmabuf_get_pinned_revocable_and_lock); 281 + 282 + void ib_umem_dmabuf_set_revoke_locked(struct ib_umem_dmabuf *umem_dmabuf, 283 + void (*revoke)(void *priv), void *priv) 284 + { 285 + dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv); 286 + 287 + umem_dmabuf->pinned_revoke = revoke; 288 + umem_dmabuf->private = priv; 289 + } 290 + EXPORT_SYMBOL(ib_umem_dmabuf_set_revoke_locked); 247 291 248 292 struct ib_umem_dmabuf *ib_umem_dmabuf_get_pinned(struct ib_device *device, 249 293 unsigned long offset, ··· 324 230 } 325 231 EXPORT_SYMBOL(ib_umem_dmabuf_get_pinned); 326 232 233 + void ib_umem_dmabuf_revoke_lock(struct ib_umem_dmabuf *umem_dmabuf) 234 + { 235 + struct dma_buf *dmabuf = umem_dmabuf->attach->dmabuf; 236 + 237 + dma_resv_lock(dmabuf->resv, NULL); 238 + } 239 + EXPORT_SYMBOL(ib_umem_dmabuf_revoke_lock); 240 + 241 + void ib_umem_dmabuf_revoke_unlock(struct ib_umem_dmabuf *umem_dmabuf) 242 + { 243 + struct dma_buf *dmabuf = umem_dmabuf->attach->dmabuf; 244 + 245 + dma_resv_unlock(dmabuf->resv); 246 + } 247 + EXPORT_SYMBOL(ib_umem_dmabuf_revoke_unlock); 248 + 327 249 void ib_umem_dmabuf_revoke(struct ib_umem_dmabuf *umem_dmabuf) 328 250 { 329 251 struct dma_buf *dmabuf = umem_dmabuf->attach->dmabuf; 330 252 331 253 dma_resv_lock(dmabuf->resv, NULL); 332 - if (umem_dmabuf->revoked) 333 - goto end; 334 - ib_umem_dmabuf_unmap_pages(umem_dmabuf); 335 - if (umem_dmabuf->pinned) { 336 - dma_buf_unpin(umem_dmabuf->attach); 337 - umem_dmabuf->pinned = 0; 338 - } 339 - umem_dmabuf->revoked = 1; 340 - end: 254 + ib_umem_dmabuf_revoke_locked(umem_dmabuf->attach); 341 255 dma_resv_unlock(dmabuf->resv); 342 256 } 343 257 EXPORT_SYMBOL(ib_umem_dmabuf_revoke);

+19 -21

drivers/infiniband/core/uverbs_cmd.c

··· 83 83 return 0; 84 84 } 85 85 86 - /* 87 - * Copy a request from userspace. If the provided 'req' is larger than the 88 - * user buffer then the user buffer is zero extended into the 'req'. If 'req' 89 - * is smaller than the user buffer then the uncopied bytes in the user buffer 90 - * must be zero. 91 - */ 92 86 static int uverbs_request(struct uverbs_attr_bundle *attrs, void *req, 93 87 size_t req_len) 94 88 { 95 - if (copy_from_user(req, attrs->ucore.inbuf, 96 - min(attrs->ucore.inlen, req_len))) 97 - return -EFAULT; 89 + int ret; 98 90 99 - if (attrs->ucore.inlen < req_len) { 100 - memset(req + attrs->ucore.inlen, 0, 101 - req_len - attrs->ucore.inlen); 102 - } else if (attrs->ucore.inlen > req_len) { 103 - if (!ib_is_buffer_cleared(attrs->ucore.inbuf + req_len, 104 - attrs->ucore.inlen - req_len)) 105 - return -EOPNOTSUPP; 106 - } 107 - return 0; 91 + ret = copy_struct_from_user(req, req_len, attrs->ucore.inbuf, 92 + attrs->ucore.inlen); 93 + if (ret == -E2BIG) 94 + ret = -EOPNOTSUPP; 95 + return ret; 108 96 } 109 97 110 98 /* ··· 1020 1032 if (cmd->comp_vector >= attrs->ufile->device->num_comp_vectors) 1021 1033 return -EINVAL; 1022 1034 1035 + if (!cmd->cqe) 1036 + return -EINVAL; 1037 + 1023 1038 obj = (struct ib_ucq_object *)uobj_alloc(UVERBS_OBJECT_CQ, attrs, 1024 1039 &ib_dev); 1025 1040 if (IS_ERR(obj)) ··· 1059 1068 rdma_restrack_new(&cq->res, RDMA_RESTRACK_CQ); 1060 1069 rdma_restrack_set_name(&cq->res, NULL); 1061 1070 1062 - ret = ib_dev->ops.create_cq(cq, &attr, attrs); 1071 + if (ib_dev->ops.create_user_cq) 1072 + ret = ib_dev->ops.create_user_cq(cq, &attr, attrs); 1073 + else 1074 + ret = ib_dev->ops.create_cq(cq, &attr, attrs); 1063 1075 if (ret) 1064 1076 goto err_free; 1065 1077 rdma_restrack_add(&cq->res); ··· 1079 1085 return uverbs_response(attrs, &resp, sizeof(resp)); 1080 1086 1081 1087 err_free: 1088 + ib_umem_release(cq->umem); 1082 1089 rdma_restrack_put(&cq->res); 1083 1090 kfree(cq); 1084 1091 err_file: ··· 1138 1143 if (ret) 1139 1144 return ret; 1140 1145 1146 + if (!cmd.cqe) 1147 + return -EINVAL; 1148 + 1141 1149 cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs); 1142 1150 if (IS_ERR(cq)) 1143 1151 return PTR_ERR(cq); 1144 1152 1145 - ret = cq->device->ops.resize_cq(cq, cmd.cqe, &attrs->driver_udata); 1153 + ret = cq->device->ops.resize_user_cq(cq, cmd.cqe, &attrs->driver_udata); 1146 1154 if (ret) 1147 1155 goto out; 1148 1156 ··· 3804 3806 UAPI_DEF_WRITE_UDATA_IO( 3805 3807 struct ib_uverbs_resize_cq, 3806 3808 struct ib_uverbs_resize_cq_resp), 3807 - UAPI_DEF_METHOD_NEEDS_FN(resize_cq)), 3809 + UAPI_DEF_METHOD_NEEDS_FN(resize_user_cq)), 3808 3810 DECLARE_UVERBS_WRITE_EX( 3809 3811 IB_USER_VERBS_EX_CMD_CREATE_CQ, 3810 3812 ib_uverbs_ex_create_cq,

+87

drivers/infiniband/core/uverbs_ioctl.c

··· 70 70 u64 internal_buffer[32]; 71 71 }; 72 72 73 + uverbs_api_ioctl_handler_fn uverbs_get_handler_fn(struct ib_udata *udata) 74 + { 75 + struct uverbs_attr_bundle *bundle = 76 + rdma_udata_to_uverbs_attr_bundle(udata); 77 + struct bundle_priv *pbundle = 78 + container_of(&bundle->hdr, struct bundle_priv, bundle); 79 + 80 + lockdep_assert_held(&bundle->ufile->device->disassociate_srcu); 81 + 82 + return srcu_dereference(pbundle->method_elm->handler, 83 + &bundle->ufile->device->disassociate_srcu); 84 + } 85 + 73 86 /* 74 87 * Each method has an absolute minimum amount of memory it needs to allocate, 75 88 * precompute that amount and determine if the onstack memory can be used or ··· 860 847 pbundle->uobj_hw_obj_valid); 861 848 } 862 849 EXPORT_SYMBOL(uverbs_finalize_uobj_create); 850 + 851 + int _ib_copy_validate_udata_in(struct ib_udata *udata, void *req, 852 + size_t kernel_size, size_t minimum_size) 853 + { 854 + int err; 855 + 856 + if (udata->inlen < minimum_size) { 857 + ibdev_dbg( 858 + rdma_udata_to_dev(udata), 859 + "System call driver input udata too small (%zu < %zu) for ioctl %ps called by %pSR\n", 860 + udata->inlen, minimum_size, 861 + uverbs_get_handler_fn(udata), 862 + __builtin_return_address(0)); 863 + return -EINVAL; 864 + } 865 + 866 + err = copy_struct_from_user(req, kernel_size, udata->inbuf, 867 + udata->inlen); 868 + if (err) { 869 + if (err == -E2BIG) { 870 + ibdev_dbg( 871 + rdma_udata_to_dev(udata), 872 + "System call driver input udata not zero from %zu -> %zu for ioctl %ps called by %pSR\n", 873 + minimum_size, udata->inlen, 874 + uverbs_get_handler_fn(udata), 875 + __builtin_return_address(0)); 876 + return -EOPNOTSUPP; 877 + } 878 + ibdev_dbg( 879 + rdma_udata_to_dev(udata), 880 + "System call driver input udata EFAULT for ioctl %ps called by %pSR\n", 881 + uverbs_get_handler_fn(udata), 882 + __builtin_return_address(0)); 883 + return err; 884 + } 885 + return 0; 886 + } 887 + EXPORT_SYMBOL(_ib_copy_validate_udata_in); 888 + 889 + int _ib_copy_validate_udata_cm_fail(struct ib_udata *udata, u64 req_cm, 890 + u64 valid_cm) 891 + { 892 + ibdev_dbg( 893 + rdma_udata_to_dev(udata), 894 + "System call driver input udata has unsupported comp_mask %llx & ~%llx = %llx for ioctl %ps called by %pSR\n", 895 + req_cm, valid_cm, req_cm & ~valid_cm, 896 + uverbs_get_handler_fn(udata), __builtin_return_address(0)); 897 + return -EOPNOTSUPP; 898 + } 899 + EXPORT_SYMBOL(_ib_copy_validate_udata_cm_fail); 900 + 901 + int _ib_respond_udata(struct ib_udata *udata, const void *src, size_t len) 902 + { 903 + size_t copy_len; 904 + 905 + /* 0 length copy_len is a NOP for copy_to_user() and doesn't fail. */ 906 + copy_len = min(len, udata->outlen); 907 + if (copy_to_user(udata->outbuf, src, copy_len)) 908 + goto err_fault; 909 + if (copy_len < udata->outlen) { 910 + if (clear_user(udata->outbuf + copy_len, 911 + udata->outlen - copy_len)) 912 + goto err_fault; 913 + } 914 + return 0; 915 + err_fault: 916 + ibdev_dbg( 917 + rdma_udata_to_dev(udata), 918 + "System call driver out udata has EFAULT (%zu into %zu) for ioctl %ps called by %pSR\n", 919 + len, udata->outlen, uverbs_get_handler_fn(udata), 920 + __builtin_return_address(0)); 921 + return -EFAULT; 922 + } 923 + EXPORT_SYMBOL(_ib_respond_udata);

+27 -14

drivers/infiniband/core/uverbs_std_types_cq.c

··· 78 78 int buffer_fd; 79 79 int ret; 80 80 81 - if ((!ib_dev->ops.create_cq && !ib_dev->ops.create_cq_umem) || !ib_dev->ops.destroy_cq) 81 + if ((!ib_dev->ops.create_cq && !ib_dev->ops.create_user_cq) || 82 + !ib_dev->ops.destroy_cq) 82 83 return -EOPNOTSUPP; 83 84 84 85 ret = uverbs_copy_from(&attr.comp_vector, attrs, 85 86 UVERBS_ATTR_CREATE_CQ_COMP_VECTOR); 86 - if (!ret) 87 - ret = uverbs_copy_from(&attr.cqe, attrs, 88 - UVERBS_ATTR_CREATE_CQ_CQE); 89 - if (!ret) 90 - ret = uverbs_copy_from(&user_handle, attrs, 91 - UVERBS_ATTR_CREATE_CQ_USER_HANDLE); 87 + if (ret) 88 + return ret; 89 + 90 + ret = uverbs_copy_from(&attr.cqe, attrs, UVERBS_ATTR_CREATE_CQ_CQE); 91 + if (ret || !attr.cqe) 92 + return ret ? : -EINVAL; 93 + 94 + ret = uverbs_copy_from(&user_handle, attrs, 95 + UVERBS_ATTR_CREATE_CQ_USER_HANDLE); 92 96 if (ret) 93 97 return ret; 94 98 ··· 134 130 135 131 if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_FD) || 136 132 uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_OFFSET) || 137 - !ib_dev->ops.create_cq_umem) { 133 + !ib_dev->ops.create_user_cq) { 138 134 ret = -EINVAL; 139 135 goto err_event_file; 140 136 } ··· 159 155 goto err_event_file; 160 156 161 157 if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_VA) || 162 - !ib_dev->ops.create_cq_umem) { 158 + !ib_dev->ops.create_user_cq) { 163 159 ret = -EINVAL; 164 160 goto err_event_file; 165 161 } ··· 172 168 } 173 169 umem = &umem_dmabuf->umem; 174 170 } else if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_OFFSET) || 175 - uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_LENGTH) || 176 - !ib_dev->ops.create_cq) { 171 + uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_LENGTH)) { 177 172 ret = -EINVAL; 178 173 goto err_event_file; 179 174 } ··· 189 186 cq->comp_handler = ib_uverbs_comp_handler; 190 187 cq->event_handler = ib_uverbs_cq_event_handler; 191 188 cq->cq_context = ev_file ? &ev_file->ev_queue : NULL; 189 + /* 190 + * If UMEM is not provided here, legacy drivers will set it during 191 + * CQ creation based on their internal udata. 192 + */ 193 + cq->umem = umem; 192 194 atomic_set(&cq->usecnt, 0); 193 195 194 196 rdma_restrack_new(&cq->res, RDMA_RESTRACK_CQ); 195 197 rdma_restrack_set_name(&cq->res, NULL); 196 198 197 - ret = umem ? ib_dev->ops.create_cq_umem(cq, &attr, umem, attrs) : 198 - ib_dev->ops.create_cq(cq, &attr, attrs); 199 + if (ib_dev->ops.create_user_cq) 200 + ret = ib_dev->ops.create_user_cq(cq, &attr, attrs); 201 + else 202 + ret = ib_dev->ops.create_cq(cq, &attr, attrs); 199 203 if (ret) 200 204 goto err_free; 205 + 206 + /* Check that driver didn't overrun existing umem */ 207 + WARN_ON(umem && cq->umem != umem); 201 208 202 209 obj->uevent.uobject.object = cq; 203 210 obj->uevent.uobject.user_handle = user_handle; ··· 219 206 return ret; 220 207 221 208 err_free: 222 - ib_umem_release(umem); 209 + ib_umem_release(cq->umem); 223 210 rdma_restrack_put(&cq->res); 224 211 kfree(cq); 225 212 err_event_file:

+8

drivers/infiniband/core/uverbs_std_types_device.c

··· 247 247 { 248 248 u32 num_comp = attrs->ufile->device->num_comp_vectors; 249 249 u64 core_support = IB_UVERBS_CORE_SUPPORT_OPTIONAL_MR_ACCESS; 250 + struct ib_device *ib_dev; 250 251 int ret; 252 + 253 + ib_dev = srcu_dereference(attrs->ufile->device->ib_dev, 254 + &attrs->ufile->device->disassociate_srcu); 255 + if (!ib_dev) 256 + return -EIO; 251 257 252 258 ret = uverbs_copy_to(attrs, UVERBS_ATTR_GET_CONTEXT_NUM_COMP_VECTORS, 253 259 &num_comp, sizeof(num_comp)); 254 260 if (IS_UVERBS_COPY_ERR(ret)) 255 261 return ret; 256 262 263 + if (ib_dev->ops.uverbs_robust_udata) 264 + core_support |= IB_UVERBS_CORE_SUPPORT_ROBUST_UDATA; 257 265 ret = uverbs_copy_to(attrs, UVERBS_ATTR_GET_CONTEXT_CORE_SUPPORT, 258 266 &core_support, sizeof(core_support)); 259 267 if (IS_UVERBS_COPY_ERR(ret))

+10 -49

drivers/infiniband/core/verbs.c

··· 49 49 #include <rdma/ib_verbs.h> 50 50 #include <rdma/ib_cache.h> 51 51 #include <rdma/ib_addr.h> 52 + #include <rdma/ib_umem.h> 52 53 #include <rdma/rw.h> 53 54 #include <rdma/lag.h> 54 55 ··· 2203 2202 if (!cq) 2204 2203 return ERR_PTR(-ENOMEM); 2205 2204 2205 + if (WARN_ON_ONCE(!cq_attr->cqe)) 2206 + return ERR_PTR(-EINVAL); 2207 + 2206 2208 cq->device = device; 2207 - cq->uobject = NULL; 2208 2209 cq->comp_handler = comp_handler; 2209 2210 cq->event_handler = event_handler; 2210 2211 cq->cq_context = cq_context; ··· 2221 2218 kfree(cq); 2222 2219 return ERR_PTR(ret); 2223 2220 } 2221 + /* 2222 + * We are in kernel verbs flow and drivers are not allowed 2223 + * to set umem pointer, it needs to stay NULL. 2224 + */ 2225 + WARN_ON_ONCE(cq->umem); 2224 2226 2225 2227 rdma_restrack_add(&cq->res); 2226 2228 return cq; ··· 2257 2249 if (ret) 2258 2250 return ret; 2259 2251 2252 + ib_umem_release(cq->umem); 2260 2253 rdma_restrack_del(&cq->res); 2261 2254 kfree(cq); 2262 2255 return ret; 2263 2256 } 2264 2257 EXPORT_SYMBOL(ib_destroy_cq_user); 2265 - 2266 - int ib_resize_cq(struct ib_cq *cq, int cqe) 2267 - { 2268 - if (cq->shared) 2269 - return -EOPNOTSUPP; 2270 - 2271 - return cq->device->ops.resize_cq ? 2272 - cq->device->ops.resize_cq(cq, cqe, NULL) : -EOPNOTSUPP; 2273 - } 2274 - EXPORT_SYMBOL(ib_resize_cq); 2275 2258 2276 2259 /* Memory regions */ 2277 2260 ··· 3152 3153 netdev, params.param); 3153 3154 } 3154 3155 EXPORT_SYMBOL(rdma_init_netdev); 3155 - 3156 - void __rdma_block_iter_start(struct ib_block_iter *biter, 3157 - struct scatterlist *sglist, unsigned int nents, 3158 - unsigned long pgsz) 3159 - { 3160 - memset(biter, 0, sizeof(struct ib_block_iter)); 3161 - biter->__sg = sglist; 3162 - biter->__sg_nents = nents; 3163 - 3164 - /* Driver provides best block size to use */ 3165 - biter->__pg_bit = __fls(pgsz); 3166 - } 3167 - EXPORT_SYMBOL(__rdma_block_iter_start); 3168 - 3169 - bool __rdma_block_iter_next(struct ib_block_iter *biter) 3170 - { 3171 - unsigned int block_offset; 3172 - unsigned int delta; 3173 - 3174 - if (!biter->__sg_nents || !biter->__sg) 3175 - return false; 3176 - 3177 - biter->__dma_addr = sg_dma_address(biter->__sg) + biter->__sg_advance; 3178 - block_offset = biter->__dma_addr & (BIT_ULL(biter->__pg_bit) - 1); 3179 - delta = BIT_ULL(biter->__pg_bit) - block_offset; 3180 - 3181 - while (biter->__sg_nents && biter->__sg && 3182 - sg_dma_len(biter->__sg) - biter->__sg_advance <= delta) { 3183 - delta -= sg_dma_len(biter->__sg) - biter->__sg_advance; 3184 - biter->__sg_advance = 0; 3185 - biter->__sg = sg_next(biter->__sg); 3186 - biter->__sg_nents--; 3187 - } 3188 - biter->__sg_advance += delta; 3189 - 3190 - return true; 3191 - } 3192 - EXPORT_SYMBOL(__rdma_block_iter_next); 3193 3156 3194 3157 /** 3195 3158 * rdma_alloc_hw_stats_struct - Helper function to allocate dynamic struct

+1 -1

drivers/infiniband/hw/bnxt_re/Makefile

··· 5 5 bnxt_re-y := main.o ib_verbs.o \ 6 6 qplib_res.o qplib_rcfw.o \ 7 7 qplib_sp.o qplib_fp.o hw_counters.o \ 8 - debugfs.o 8 + debugfs.o uapi.o

+433 -469

drivers/infiniband/hw/bnxt_re/ib_verbs.c

··· 187 187 struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev); 188 188 struct bnxt_qplib_dev_attr *dev_attr = rdev->dev_attr; 189 189 struct bnxt_re_query_device_ex_resp resp = {}; 190 - size_t outlen = (udata) ? udata->outlen : 0; 191 190 int rc = 0; 191 + 192 + rc = ib_is_udata_in_empty(udata); 193 + if (rc) 194 + return rc; 192 195 193 196 memset(ib_attr, 0, sizeof(*ib_attr)); 194 197 memcpy(&ib_attr->fw_ver, dev_attr->fw_ver, ··· 257 254 ib_attr->max_pkeys = 1; 258 255 ib_attr->local_ca_ack_delay = BNXT_RE_DEFAULT_ACK_DELAY; 259 256 260 - if ((offsetofend(typeof(resp), packet_pacing_caps) <= outlen) && 261 - _is_modify_qp_rate_limit_supported(dev_attr->dev_cap_flags2)) { 257 + if (_is_modify_qp_rate_limit_supported(dev_attr->dev_cap_flags2)) { 262 258 resp.packet_pacing_caps.qp_rate_limit_min = 263 259 dev_attr->rate_limit_min; 264 260 resp.packet_pacing_caps.qp_rate_limit_max = ··· 265 263 resp.packet_pacing_caps.supported_qpts = 266 264 1 << IB_QPT_RC; 267 265 } 268 - if (outlen) 269 - rc = ib_copy_to_udata(udata, &resp, 270 - min(sizeof(resp), outlen)); 271 - 272 - return rc; 266 + return ib_respond_udata(udata, resp); 273 267 } 274 268 275 269 int bnxt_re_modify_device(struct ib_device *ibdev, ··· 642 644 return rc; 643 645 } 644 646 645 - static struct bnxt_re_user_mmap_entry* 647 + struct bnxt_re_user_mmap_entry* 646 648 bnxt_re_mmap_entry_insert(struct bnxt_re_ucontext *uctx, u64 mem_offset, 647 649 enum bnxt_re_mmap_flag mmap_flag, u64 *offset) 648 650 { ··· 690 692 { 691 693 struct bnxt_re_pd *pd = container_of(ib_pd, struct bnxt_re_pd, ib_pd); 692 694 struct bnxt_re_dev *rdev = pd->rdev; 695 + int ret; 696 + 697 + ret = ib_is_udata_in_empty(udata); 698 + if (ret) 699 + return ret; 693 700 694 701 if (udata) { 695 702 rdma_user_mmap_entry_remove(pd->pd_db_mmap); ··· 709 706 &pd->qplib_pd)) 710 707 atomic_dec(&rdev->stats.res.pd_count); 711 708 } 712 - return 0; 709 + return ib_respond_empty_udata(udata); 713 710 } 714 711 715 712 int bnxt_re_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) ··· 722 719 struct bnxt_re_user_mmap_entry *entry = NULL; 723 720 u32 active_pds; 724 721 int rc = 0; 722 + 723 + rc = ib_is_udata_in_empty(udata); 724 + if (rc) 725 + return rc; 725 726 726 727 pd->rdev = rdev; 727 728 if (bnxt_qplib_alloc_pd(&rdev->qplib_res, &pd->qplib_pd)) { ··· 763 756 764 757 pd->pd_db_mmap = &entry->rdma_entry; 765 758 766 - rc = ib_copy_to_udata(udata, &resp, min(sizeof(resp), udata->outlen)); 759 + rc = ib_respond_udata(udata, resp); 767 760 if (rc) { 768 761 rdma_user_mmap_entry_remove(pd->pd_db_mmap); 769 762 rc = -EFAULT; ··· 841 834 u8 nw_type; 842 835 int rc; 843 836 837 + rc = ib_is_udata_in_empty(udata); 838 + if (rc) 839 + return rc; 840 + 844 841 if (!(rdma_ah_get_ah_flags(ah_attr) & IB_AH_GRH)) { 845 842 ibdev_err(&rdev->ibdev, "Failed to alloc AH: GRH not set"); 846 843 return -EINVAL; ··· 898 887 if (active_ahs > rdev->stats.res.ah_watermark) 899 888 rdev->stats.res.ah_watermark = active_ahs; 900 889 901 - return 0; 890 + return ib_respond_empty_udata(udata); 902 891 } 903 892 904 893 int bnxt_re_query_ah(struct ib_ah *ib_ah, struct rdma_ah_attr *ah_attr) ··· 995 984 dev_err(rdev_to_dev(rdev), "Failed to delete unique GID, rc: %d\n", rc); 996 985 } 997 986 987 + static void bnxt_re_qp_free_umem(struct bnxt_re_qp *qp) 988 + { 989 + ib_umem_release(qp->rumem); 990 + ib_umem_release(qp->sumem); 991 + } 992 + 998 993 /* Queue Pairs */ 999 994 int bnxt_re_destroy_qp(struct ib_qp *ib_qp, struct ib_udata *udata) 1000 995 { ··· 1011 994 struct bnxt_qplib_nq *rcq_nq = NULL; 1012 995 unsigned int flags; 1013 996 int rc; 997 + 998 + rc = ib_is_udata_in_empty(udata); 999 + if (rc) 1000 + return rc; 1014 1001 1015 1002 bnxt_re_debug_rem_qpinfo(rdev, qp); 1016 1003 ··· 1047 1026 if (qp->qplib_qp.type == CMDQ_CREATE_QP_TYPE_RAW_ETHERTYPE) 1048 1027 bnxt_re_del_unique_gid(rdev); 1049 1028 1050 - ib_umem_release(qp->rumem); 1051 - ib_umem_release(qp->sumem); 1029 + bnxt_re_qp_free_umem(qp); 1052 1030 1053 1031 /* Flush all the entries of notification queue associated with 1054 1032 * given qp. ··· 1058 1038 if (scq_nq != rcq_nq) 1059 1039 bnxt_re_synchronize_nq(rcq_nq); 1060 1040 1061 - return 0; 1041 + return ib_respond_empty_udata(udata); 1062 1042 } 1063 1043 1064 1044 static u8 __from_ib_qp_type(enum ib_qp_type type) ··· 1191 1171 } 1192 1172 1193 1173 qplib_qp->dpi = &cntx->dpi; 1174 + qplib_qp->is_user = true; 1194 1175 return 0; 1195 1176 rqfail: 1196 1177 ib_umem_release(qp->sumem); ··· 1249 1228 return NULL; 1250 1229 } 1251 1230 1231 + static int bnxt_re_qp_alloc_init_xrrq(struct bnxt_re_qp *qp) 1232 + { 1233 + struct bnxt_qplib_res *res = &qp->rdev->qplib_res; 1234 + struct bnxt_qplib_qp *qplib_qp = &qp->qplib_qp; 1235 + struct bnxt_qplib_hwq_attr hwq_attr = {}; 1236 + struct bnxt_qplib_sg_info sginfo = {}; 1237 + struct bnxt_qplib_hwq *irrq, *orrq; 1238 + int rc, req_size; 1239 + 1240 + orrq = &qplib_qp->orrq; 1241 + orrq->max_elements = 1242 + ORD_LIMIT_TO_ORRQ_SLOTS(qplib_qp->max_rd_atomic); 1243 + req_size = orrq->max_elements * 1244 + BNXT_QPLIB_MAX_ORRQE_ENTRY_SIZE + PAGE_SIZE - 1; 1245 + req_size &= ~(PAGE_SIZE - 1); 1246 + sginfo.pgsize = req_size; 1247 + sginfo.pgshft = PAGE_SHIFT; 1248 + 1249 + hwq_attr.res = res; 1250 + hwq_attr.sginfo = &sginfo; 1251 + hwq_attr.depth = orrq->max_elements; 1252 + hwq_attr.stride = BNXT_QPLIB_MAX_ORRQE_ENTRY_SIZE; 1253 + hwq_attr.aux_stride = 0; 1254 + hwq_attr.aux_depth = 0; 1255 + hwq_attr.type = HWQ_TYPE_CTX; 1256 + rc = bnxt_qplib_alloc_init_hwq(orrq, &hwq_attr); 1257 + if (rc) 1258 + return rc; 1259 + 1260 + irrq = &qplib_qp->irrq; 1261 + irrq->max_elements = 1262 + IRD_LIMIT_TO_IRRQ_SLOTS(qplib_qp->max_dest_rd_atomic); 1263 + req_size = irrq->max_elements * 1264 + BNXT_QPLIB_MAX_IRRQE_ENTRY_SIZE + PAGE_SIZE - 1; 1265 + req_size &= ~(PAGE_SIZE - 1); 1266 + sginfo.pgsize = req_size; 1267 + hwq_attr.sginfo = &sginfo; 1268 + hwq_attr.depth = irrq->max_elements; 1269 + hwq_attr.stride = BNXT_QPLIB_MAX_IRRQE_ENTRY_SIZE; 1270 + rc = bnxt_qplib_alloc_init_hwq(irrq, &hwq_attr); 1271 + if (rc) 1272 + goto free_orrq_hwq; 1273 + return 0; 1274 + free_orrq_hwq: 1275 + bnxt_qplib_free_hwq(res, orrq); 1276 + return rc; 1277 + } 1278 + 1279 + static int bnxt_re_setup_qp_hwqs(struct bnxt_re_qp *qp) 1280 + { 1281 + struct bnxt_qplib_res *res = &qp->rdev->qplib_res; 1282 + struct bnxt_qplib_qp *qplib_qp = &qp->qplib_qp; 1283 + struct bnxt_qplib_hwq_attr hwq_attr = {}; 1284 + struct bnxt_qplib_q *sq = &qplib_qp->sq; 1285 + struct bnxt_qplib_q *rq = &qplib_qp->rq; 1286 + u8 wqe_mode = qplib_qp->wqe_mode; 1287 + u8 pg_sz_lvl; 1288 + int rc; 1289 + 1290 + hwq_attr.res = res; 1291 + hwq_attr.sginfo = &sq->sg_info; 1292 + hwq_attr.stride = bnxt_qplib_get_stride(); 1293 + hwq_attr.depth = bnxt_qplib_get_depth(sq, wqe_mode, true); 1294 + hwq_attr.aux_stride = qplib_qp->psn_sz; 1295 + hwq_attr.aux_depth = (qplib_qp->psn_sz) ? 1296 + bnxt_qplib_set_sq_size(sq, wqe_mode) : 0; 1297 + if (qplib_qp->is_host_msn_tbl && qplib_qp->psn_sz) 1298 + hwq_attr.aux_depth = qplib_qp->msn_tbl_sz; 1299 + hwq_attr.type = HWQ_TYPE_QUEUE; 1300 + rc = bnxt_qplib_alloc_init_hwq(&sq->hwq, &hwq_attr); 1301 + if (rc) 1302 + return rc; 1303 + 1304 + pg_sz_lvl = bnxt_qplib_base_pg_size(&sq->hwq) << CMDQ_CREATE_QP_SQ_PG_SIZE_SFT; 1305 + pg_sz_lvl |= ((sq->hwq.level & CMDQ_CREATE_QP_SQ_LVL_MASK) << 1306 + CMDQ_CREATE_QP_SQ_LVL_SFT); 1307 + sq->hwq.pg_sz_lvl = pg_sz_lvl; 1308 + 1309 + hwq_attr.res = res; 1310 + hwq_attr.sginfo = &rq->sg_info; 1311 + hwq_attr.stride = bnxt_qplib_get_stride(); 1312 + hwq_attr.depth = bnxt_qplib_get_depth(rq, qplib_qp->wqe_mode, false); 1313 + hwq_attr.aux_stride = 0; 1314 + hwq_attr.aux_depth = 0; 1315 + hwq_attr.type = HWQ_TYPE_QUEUE; 1316 + rc = bnxt_qplib_alloc_init_hwq(&rq->hwq, &hwq_attr); 1317 + if (rc) 1318 + goto free_sq_hwq; 1319 + pg_sz_lvl = bnxt_qplib_base_pg_size(&rq->hwq) << 1320 + CMDQ_CREATE_QP_RQ_PG_SIZE_SFT; 1321 + pg_sz_lvl |= ((rq->hwq.level & CMDQ_CREATE_QP_RQ_LVL_MASK) << 1322 + CMDQ_CREATE_QP_RQ_LVL_SFT); 1323 + rq->hwq.pg_sz_lvl = pg_sz_lvl; 1324 + 1325 + if (qplib_qp->psn_sz) { 1326 + rc = bnxt_re_qp_alloc_init_xrrq(qp); 1327 + if (rc) 1328 + goto free_rq_hwq; 1329 + } 1330 + 1331 + return 0; 1332 + free_rq_hwq: 1333 + bnxt_qplib_free_hwq(res, &rq->hwq); 1334 + free_sq_hwq: 1335 + bnxt_qplib_free_hwq(res, &sq->hwq); 1336 + return rc; 1337 + } 1338 + 1252 1339 static struct bnxt_re_qp *bnxt_re_create_shadow_qp 1253 1340 (struct bnxt_re_pd *pd, 1254 1341 struct bnxt_qplib_res *qp1_res, ··· 1378 1249 qp->qplib_qp.pd = &pd->qplib_pd; 1379 1250 qp->qplib_qp.qp_handle = (u64)(unsigned long)(&qp->qplib_qp); 1380 1251 qp->qplib_qp.type = IB_QPT_UD; 1252 + qp->qplib_qp.cctx = rdev->chip_ctx; 1381 1253 1382 1254 qp->qplib_qp.max_inline_data = 0; 1383 1255 qp->qplib_qp.sig_type = true; ··· 1411 1281 qp->qplib_qp.rq_hdr_buf_size = BNXT_QPLIB_MAX_GRH_HDR_SIZE_IPV6; 1412 1282 qp->qplib_qp.dpi = &rdev->dpi_privileged; 1413 1283 1414 - rc = bnxt_qplib_create_qp(qp1_res, &qp->qplib_qp); 1284 + rc = bnxt_re_setup_qp_hwqs(qp); 1415 1285 if (rc) 1416 1286 goto fail; 1287 + 1288 + rc = bnxt_qplib_create_qp(qp1_res, &qp->qplib_qp); 1289 + if (rc) 1290 + goto free_hwq; 1417 1291 1418 1292 spin_lock_init(&qp->sq_lock); 1419 1293 INIT_LIST_HEAD(&qp->list); ··· 1426 1292 atomic_inc(&rdev->stats.res.qp_count); 1427 1293 mutex_unlock(&rdev->qp_lock); 1428 1294 return qp; 1295 + 1296 + free_hwq: 1297 + bnxt_qplib_free_qp_res(&rdev->qplib_res, &qp->qplib_qp); 1429 1298 fail: 1430 1299 kfree(qp); 1431 1300 return NULL; ··· 1442 1305 struct bnxt_qplib_qp *qplqp; 1443 1306 struct bnxt_re_dev *rdev; 1444 1307 struct bnxt_qplib_q *rq; 1445 - int entries; 1446 1308 1447 1309 rdev = qp->rdev; 1448 1310 qplqp = &qp->qplib_qp; ··· 1464 1328 /* Allocate 1 more than what's provided so posting max doesn't 1465 1329 * mean empty. 1466 1330 */ 1467 - entries = bnxt_re_init_depth(init_attr->cap.max_recv_wr + 1, uctx); 1468 - rq->max_wqe = min_t(u32, entries, dev_attr->max_qp_wqes + 1); 1331 + rq->max_wqe = bnxt_re_init_depth(init_attr->cap.max_recv_wr + 1, 1332 + dev_attr->max_qp_wqes + 1, 1333 + uctx); 1469 1334 rq->max_sw_wqe = rq->max_wqe; 1470 1335 rq->q_full_delta = 0; 1471 1336 rq->sg_info.pgsize = PAGE_SIZE; ··· 1504 1367 struct bnxt_re_dev *rdev; 1505 1368 struct bnxt_qplib_q *sq; 1506 1369 int diff = 0; 1507 - int entries; 1508 1370 int rc; 1509 1371 1510 1372 rdev = qp->rdev; ··· 1512 1376 dev_attr = rdev->dev_attr; 1513 1377 1514 1378 sq->max_sge = init_attr->cap.max_send_sge; 1515 - entries = init_attr->cap.max_send_wr; 1516 1379 if (uctx && qplqp->wqe_mode == BNXT_QPLIB_WQE_MODE_VARIABLE) { 1517 1380 sq->max_wqe = ureq->sq_slots; 1518 1381 sq->max_sw_wqe = ureq->sq_slots; ··· 1527 1392 return rc; 1528 1393 1529 1394 /* Allocate 128 + 1 more than what's provided */ 1530 - diff = (qplqp->wqe_mode == BNXT_QPLIB_WQE_MODE_VARIABLE) ? 1531 - 0 : BNXT_QPLIB_RESERVED_QP_WRS; 1532 - entries = bnxt_re_init_depth(entries + diff + 1, uctx); 1533 - sq->max_wqe = min_t(u32, entries, dev_attr->max_qp_wqes + diff + 1); 1395 + if (qplqp->wqe_mode != BNXT_QPLIB_WQE_MODE_VARIABLE) 1396 + diff = BNXT_QPLIB_RESERVED_QP_WRS; 1397 + sq->max_wqe = bnxt_re_init_depth( 1398 + init_attr->cap.max_send_wr + diff + 1, 1399 + dev_attr->max_qp_wqes + diff + 1, uctx); 1534 1400 if (qplqp->wqe_mode == BNXT_QPLIB_WQE_MODE_VARIABLE) 1535 1401 sq->max_sw_wqe = bnxt_qplib_get_depth(sq, qplqp->wqe_mode, true); 1536 1402 else ··· 1558 1422 struct bnxt_qplib_dev_attr *dev_attr; 1559 1423 struct bnxt_qplib_qp *qplqp; 1560 1424 struct bnxt_re_dev *rdev; 1561 - int entries; 1562 1425 1563 1426 rdev = qp->rdev; 1564 1427 qplqp = &qp->qplib_qp; 1565 1428 dev_attr = rdev->dev_attr; 1566 1429 1567 1430 if (!bnxt_qplib_is_chip_gen_p5_p7(rdev->chip_ctx)) { 1568 - entries = bnxt_re_init_depth(init_attr->cap.max_send_wr + 1, uctx); 1569 - qplqp->sq.max_wqe = min_t(u32, entries, 1570 - dev_attr->max_qp_wqes + 1); 1431 + qplqp->sq.max_wqe = 1432 + bnxt_re_init_depth(init_attr->cap.max_send_wr + 1, 1433 + dev_attr->max_qp_wqes + 1, uctx); 1571 1434 qplqp->sq.q_full_delta = qplqp->sq.max_wqe - 1572 1435 init_attr->cap.max_send_wr; 1573 1436 qplqp->sq.max_sge++; /* Need one extra sge to put UD header */ ··· 1597 1462 return qptype; 1598 1463 } 1599 1464 1465 + static void bnxt_re_qp_calculate_msn_psn_size(struct bnxt_re_qp *qp) 1466 + { 1467 + struct bnxt_qplib_qp *qplib_qp = &qp->qplib_qp; 1468 + struct bnxt_qplib_q *sq = &qplib_qp->sq; 1469 + struct bnxt_re_dev *rdev = qp->rdev; 1470 + u8 wqe_mode = qplib_qp->wqe_mode; 1471 + 1472 + if (rdev->dev_attr) 1473 + qplib_qp->is_host_msn_tbl = 1474 + _is_host_msn_table(rdev->dev_attr->dev_cap_flags2); 1475 + 1476 + if (qplib_qp->type == CMDQ_CREATE_QP_TYPE_RC) { 1477 + qplib_qp->psn_sz = bnxt_qplib_is_chip_gen_p5_p7(rdev->chip_ctx) ? 1478 + sizeof(struct sq_psn_search_ext) : 1479 + sizeof(struct sq_psn_search); 1480 + if (qplib_qp->is_host_msn_tbl) { 1481 + qplib_qp->psn_sz = sizeof(struct sq_msn_search); 1482 + qplib_qp->msn = 0; 1483 + } 1484 + } 1485 + 1486 + /* Update msn tbl size */ 1487 + if (qplib_qp->is_host_msn_tbl && qplib_qp->psn_sz) { 1488 + if (wqe_mode == BNXT_QPLIB_WQE_MODE_STATIC) 1489 + qplib_qp->msn_tbl_sz = 1490 + roundup_pow_of_two(bnxt_qplib_set_sq_size(sq, wqe_mode)); 1491 + else 1492 + qplib_qp->msn_tbl_sz = 1493 + roundup_pow_of_two(bnxt_qplib_set_sq_size(sq, wqe_mode)) / 2; 1494 + qplib_qp->msn = 0; 1495 + } 1496 + } 1497 + 1600 1498 static int bnxt_re_init_qp_attr(struct bnxt_re_qp *qp, struct bnxt_re_pd *pd, 1601 1499 struct ib_qp_init_attr *init_attr, 1602 1500 struct bnxt_re_ucontext *uctx, ··· 1652 1484 qplqp->max_inline_data = init_attr->cap.max_inline_data; 1653 1485 qplqp->sig_type = init_attr->sq_sig_type == IB_SIGNAL_ALL_WR; 1654 1486 qptype = bnxt_re_init_qp_type(rdev, init_attr); 1655 - if (qptype < 0) { 1656 - rc = qptype; 1657 - goto out; 1658 - } 1487 + if (qptype < 0) 1488 + return qptype; 1659 1489 qplqp->type = (u8)qptype; 1660 1490 qplqp->wqe_mode = bnxt_re_is_var_size_supported(rdev, uctx); 1491 + qplqp->dev_cap_flags = dev_attr->dev_cap_flags; 1492 + qplqp->cctx = rdev->chip_ctx; 1661 1493 if (init_attr->qp_type == IB_QPT_RC) { 1662 1494 qplqp->max_rd_atomic = dev_attr->max_qp_rd_atom; 1663 1495 qplqp->max_dest_rd_atomic = dev_attr->max_qp_init_rd_atom; ··· 1687 1519 /* Setup RQ/SRQ */ 1688 1520 rc = bnxt_re_init_rq_attr(qp, init_attr, uctx); 1689 1521 if (rc) 1690 - goto out; 1522 + return rc; 1691 1523 if (init_attr->qp_type == IB_QPT_GSI) 1692 1524 bnxt_re_adjust_gsi_rq_attr(qp); 1693 1525 1694 1526 /* Setup SQ */ 1695 1527 rc = bnxt_re_init_sq_attr(qp, init_attr, uctx, ureq); 1696 1528 if (rc) 1697 - goto out; 1529 + return rc; 1698 1530 if (init_attr->qp_type == IB_QPT_GSI) 1699 1531 bnxt_re_adjust_gsi_sq_attr(qp, init_attr, uctx); 1700 1532 1701 - if (uctx) /* This will update DPI and qp_handle */ 1533 + if (uctx) { /* This will update DPI and qp_handle */ 1702 1534 rc = bnxt_re_init_user_qp(rdev, pd, qp, uctx, ureq); 1703 - out: 1535 + if (rc) 1536 + return rc; 1537 + } 1538 + 1539 + bnxt_re_qp_calculate_msn_psn_size(qp); 1540 + 1541 + rc = bnxt_re_setup_qp_hwqs(qp); 1542 + if (rc) 1543 + goto free_umem; 1544 + 1545 + return 0; 1546 + free_umem: 1547 + bnxt_re_qp_free_umem(qp); 1704 1548 return rc; 1705 1549 } 1706 1550 ··· 1769 1589 1770 1590 rdev = qp->rdev; 1771 1591 qplqp = &qp->qplib_qp; 1592 + qplqp->cctx = rdev->chip_ctx; 1772 1593 1773 1594 qplqp->rq_hdr_buf_size = BNXT_QPLIB_MAX_QP1_RQ_HDR_SIZE_V2; 1774 1595 qplqp->sq_hdr_buf_size = BNXT_QPLIB_MAX_QP1_SQ_HDR_SIZE_V2; ··· 1852 1671 qp = container_of(ib_qp, struct bnxt_re_qp, ib_qp); 1853 1672 1854 1673 uctx = rdma_udata_to_drv_context(udata, struct bnxt_re_ucontext, ib_uctx); 1855 - if (udata) 1856 - if (ib_copy_from_udata(&ureq, udata, min(udata->inlen, sizeof(ureq)))) 1857 - return -EFAULT; 1674 + if (udata) { 1675 + rc = ib_copy_validate_udata_in_cm(udata, ureq, qp_handle, 0); 1676 + if (rc) 1677 + return rc; 1678 + } 1858 1679 1859 1680 rc = bnxt_re_test_qp_limits(rdev, qp_init_attr, dev_attr); 1860 1681 if (!rc) { ··· 1875 1692 if (rc == -ENODEV) 1876 1693 goto qp_destroy; 1877 1694 if (rc) 1878 - goto fail; 1695 + goto free_hwq; 1879 1696 } else { 1880 1697 rc = bnxt_qplib_create_qp(&rdev->qplib_res, &qp->qplib_qp); 1881 1698 if (rc) { 1882 1699 ibdev_err(&rdev->ibdev, "Failed to create HW QP"); 1883 - goto free_umem; 1700 + goto free_hwq; 1884 1701 } 1702 + 1885 1703 if (udata) { 1886 1704 struct bnxt_re_qp_resp resp; 1887 1705 1888 1706 resp.qpid = qp->qplib_qp.id; 1889 1707 resp.rsvd = 0; 1890 - rc = ib_copy_to_udata(udata, &resp, sizeof(resp)); 1891 - if (rc) { 1892 - ibdev_err(&rdev->ibdev, "Failed to copy QP udata"); 1708 + rc = ib_respond_udata(udata, resp); 1709 + if (rc) 1893 1710 goto qp_destroy; 1894 - } 1895 1711 } 1896 1712 } 1897 1713 ··· 1931 1749 return 0; 1932 1750 qp_destroy: 1933 1751 bnxt_qplib_destroy_qp(&rdev->qplib_res, &qp->qplib_qp); 1934 - free_umem: 1935 - ib_umem_release(qp->rumem); 1936 - ib_umem_release(qp->sumem); 1752 + free_hwq: 1753 + bnxt_qplib_free_qp_res(&rdev->qplib_res, &qp->qplib_qp); 1754 + bnxt_re_qp_free_umem(qp); 1937 1755 fail: 1938 1756 return rc; 1939 1757 } ··· 2023 1841 ib_srq); 2024 1842 struct bnxt_re_dev *rdev = srq->rdev; 2025 1843 struct bnxt_qplib_srq *qplib_srq = &srq->qplib_srq; 1844 + int ret; 1845 + 1846 + ret = ib_is_udata_in_empty(udata); 1847 + if (ret) 1848 + return ret; 2026 1849 2027 1850 if (rdev->chip_ctx->modes.toggle_bits & BNXT_QPLIB_SRQ_TOGGLE_BIT) { 2028 1851 free_page((unsigned long)srq->uctx_srq_page); ··· 2036 1849 bnxt_qplib_destroy_srq(&rdev->qplib_res, qplib_srq); 2037 1850 ib_umem_release(srq->umem); 2038 1851 atomic_dec(&rdev->stats.res.srq_count); 2039 - return 0; 1852 + return ib_respond_empty_udata(udata); 2040 1853 } 2041 1854 2042 1855 static int bnxt_re_init_user_srq(struct bnxt_re_dev *rdev, ··· 2050 1863 int bytes = 0; 2051 1864 struct bnxt_re_ucontext *cntx = rdma_udata_to_drv_context( 2052 1865 udata, struct bnxt_re_ucontext, ib_uctx); 1866 + int rc; 2053 1867 2054 - if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) 2055 - return -EFAULT; 1868 + rc = ib_copy_validate_udata_in(udata, ureq, srq_handle); 1869 + if (rc) 1870 + return rc; 2056 1871 2057 1872 bytes = (qplib_srq->max_wqe * qplib_srq->wqe_size); 2058 1873 bytes = PAGE_ALIGN(bytes); ··· 2084 1895 struct bnxt_re_pd *pd; 2085 1896 struct ib_pd *ib_pd; 2086 1897 u32 active_srqs; 2087 - int rc, entries; 1898 + int rc; 2088 1899 2089 1900 ib_pd = ib_srq->pd; 2090 1901 pd = container_of(ib_pd, struct bnxt_re_pd, ib_pd); ··· 2110 1921 /* Allocate 1 more than what's provided so posting max doesn't 2111 1922 * mean empty 2112 1923 */ 2113 - entries = bnxt_re_init_depth(srq_init_attr->attr.max_wr + 1, uctx); 2114 - if (entries > dev_attr->max_srq_wqes + 1) 2115 - entries = dev_attr->max_srq_wqes + 1; 2116 - srq->qplib_srq.max_wqe = entries; 1924 + srq->qplib_srq.max_wqe = 1925 + bnxt_re_init_depth(srq_init_attr->attr.max_wr + 1, 1926 + dev_attr->max_srq_wqes + 1, uctx); 2117 1927 2118 1928 srq->qplib_srq.max_sge = srq_init_attr->attr.max_sge; 2119 1929 /* 128 byte wqe size for SRQ . So use max sges */ ··· 2148 1960 } 2149 1961 resp.comp_mask |= BNXT_RE_SRQ_TOGGLE_PAGE_SUPPORT; 2150 1962 } 2151 - rc = ib_copy_to_udata(udata, &resp, sizeof(resp)); 1963 + rc = ib_respond_udata(udata, resp); 2152 1964 if (rc) { 2153 - ibdev_err(&rdev->ibdev, "SRQ copy to udata failed!"); 2154 1965 bnxt_qplib_destroy_srq(&rdev->qplib_res, 2155 1966 &srq->qplib_srq); 2156 1967 goto fail; ··· 2175 1988 struct bnxt_re_srq *srq = container_of(ib_srq, struct bnxt_re_srq, 2176 1989 ib_srq); 2177 1990 struct bnxt_re_dev *rdev = srq->rdev; 1991 + int ret; 1992 + 1993 + ret = ib_is_udata_in_empty(udata); 1994 + if (ret) 1995 + return ret; 2178 1996 2179 1997 switch (srq_attr_mask) { 2180 1998 case IB_SRQ_MAX_WR: ··· 2196 2004 /* On success, update the shadow */ 2197 2005 srq->srq_limit = srq_attr->srq_limit; 2198 2006 /* No need to Build and send response back to udata */ 2199 - return 0; 2007 + return ib_respond_empty_udata(udata); 2200 2008 default: 2201 2009 ibdev_err(&rdev->ibdev, 2202 2010 "Unsupported srq_attr_mask 0x%x", srq_attr_mask); ··· 2293 2101 struct bnxt_re_dev *rdev = qp->rdev; 2294 2102 struct bnxt_qplib_dev_attr *dev_attr = rdev->dev_attr; 2295 2103 enum ib_qp_state curr_qp_state, new_qp_state; 2296 - int rc, entries; 2104 + int rc; 2297 2105 unsigned int flags; 2298 2106 u8 nw_type; 2107 + 2108 + rc = ib_is_udata_in_empty(udata); 2109 + if (rc) 2110 + return rc; 2299 2111 2300 2112 if (qp_attr_mask & ~(IB_QP_ATTR_STANDARD_BITS | IB_QP_RATE_LIMIT)) 2301 2113 return -EOPNOTSUPP; ··· 2507 2311 "Create QP failed - max exceeded"); 2508 2312 return -EINVAL; 2509 2313 } 2510 - entries = bnxt_re_init_depth(qp_attr->cap.max_send_wr, uctx); 2511 - qp->qplib_qp.sq.max_wqe = min_t(u32, entries, 2512 - dev_attr->max_qp_wqes + 1); 2314 + qp->qplib_qp.sq.max_wqe = 2315 + bnxt_re_init_depth(qp_attr->cap.max_send_wr, 2316 + dev_attr->max_qp_wqes + 1, uctx); 2513 2317 qp->qplib_qp.sq.q_full_delta = qp->qplib_qp.sq.max_wqe - 2514 2318 qp_attr->cap.max_send_wr; 2515 2319 /* ··· 2520 2324 qp->qplib_qp.sq.q_full_delta -= 1; 2521 2325 qp->qplib_qp.sq.max_sge = qp_attr->cap.max_send_sge; 2522 2326 if (qp->qplib_qp.rq.max_wqe) { 2523 - entries = bnxt_re_init_depth(qp_attr->cap.max_recv_wr, uctx); 2524 - qp->qplib_qp.rq.max_wqe = 2525 - min_t(u32, entries, dev_attr->max_qp_wqes + 1); 2327 + qp->qplib_qp.rq.max_wqe = bnxt_re_init_depth( 2328 + qp_attr->cap.max_recv_wr, 2329 + dev_attr->max_qp_wqes + 1, uctx); 2526 2330 qp->qplib_qp.rq.max_sw_wqe = qp->qplib_qp.rq.max_wqe; 2527 2331 qp->qplib_qp.rq.q_full_delta = qp->qplib_qp.rq.max_wqe - 2528 2332 qp_attr->cap.max_recv_wr; ··· 2541 2345 ibdev_err(&rdev->ibdev, "Failed to modify HW QP"); 2542 2346 return rc; 2543 2347 } 2544 - if (ib_qp->qp_type == IB_QPT_GSI && rdev->gsi_ctx.gsi_sqp) 2348 + if (ib_qp->qp_type == IB_QPT_GSI && rdev->gsi_ctx.gsi_sqp) { 2545 2349 rc = bnxt_re_modify_shadow_qp(rdev, qp, qp_attr_mask); 2546 - return rc; 2350 + if (rc) 2351 + return rc; 2352 + } 2353 + return ib_respond_empty_udata(udata); 2547 2354 } 2548 2355 2549 2356 int bnxt_re_query_qp(struct ib_qp *ib_qp, struct ib_qp_attr *qp_attr, ··· 3321 3122 struct bnxt_qplib_nq *nq; 3322 3123 struct bnxt_re_dev *rdev; 3323 3124 struct bnxt_re_cq *cq; 3125 + int ret; 3324 3126 3325 3127 cq = container_of(ib_cq, struct bnxt_re_cq, ib_cq); 3326 3128 rdev = cq->rdev; 3327 3129 nq = cq->qplib_cq.nq; 3328 3130 cctx = rdev->chip_ctx; 3131 + 3132 + ret = ib_is_udata_in_empty(udata); 3133 + if (ret) 3134 + return ret; 3329 3135 3330 3136 if (cctx->modes.toggle_bits & BNXT_QPLIB_CQ_TOGGLE_BIT) { 3331 3137 free_page((unsigned long)cq->uctx_cq_page); ··· 3339 3135 bnxt_qplib_destroy_cq(&rdev->qplib_res, &cq->qplib_cq); 3340 3136 3341 3137 bnxt_re_put_nq(rdev, nq); 3342 - ib_umem_release(cq->umem); 3343 3138 3344 3139 atomic_dec(&rdev->stats.res.cq_count); 3345 3140 kfree(cq->cql); 3141 + return ib_respond_empty_udata(udata); 3142 + } 3143 + 3144 + static int bnxt_re_setup_sginfo(struct bnxt_re_dev *rdev, 3145 + struct ib_umem *umem, 3146 + struct bnxt_qplib_sg_info *sginfo) 3147 + { 3148 + unsigned long page_size; 3149 + 3150 + if (!umem) 3151 + return -EINVAL; 3152 + 3153 + page_size = ib_umem_find_best_pgsz(umem, SZ_4K, 0); 3154 + if (!page_size || page_size != SZ_4K) 3155 + return -EINVAL; 3156 + 3157 + sginfo->umem = umem; 3158 + sginfo->npages = ib_umem_num_dma_blocks(umem, page_size); 3159 + sginfo->pgsize = page_size; 3160 + sginfo->pgshft = __builtin_ctz(page_size); 3346 3161 return 0; 3347 3162 } 3348 3163 3349 - int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 3350 - struct uverbs_attr_bundle *attrs) 3164 + int bnxt_re_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 3165 + struct uverbs_attr_bundle *attrs) 3351 3166 { 3352 3167 struct bnxt_re_cq *cq = container_of(ibcq, struct bnxt_re_cq, ib_cq); 3353 3168 struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibcq->device, ibdev); ··· 3375 3152 rdma_udata_to_drv_context(udata, struct bnxt_re_ucontext, ib_uctx); 3376 3153 struct bnxt_qplib_dev_attr *dev_attr = rdev->dev_attr; 3377 3154 struct bnxt_qplib_chip_ctx *cctx; 3378 - int cqe = attr->cqe; 3379 - int rc, entries; 3155 + struct bnxt_re_cq_resp resp = {}; 3156 + struct bnxt_re_cq_req req; 3157 + int rc; 3158 + u32 active_cqs, entries; 3159 + 3160 + if (attr->flags) 3161 + return -EOPNOTSUPP; 3162 + 3163 + /* Validate CQ fields */ 3164 + if (attr->cqe > dev_attr->max_cq_wqes) 3165 + return -EINVAL; 3166 + 3167 + cq->rdev = rdev; 3168 + cctx = rdev->chip_ctx; 3169 + cq->qplib_cq.cq_handle = (u64)(unsigned long)(&cq->qplib_cq); 3170 + 3171 + rc = ib_copy_validate_udata_in_cm(udata, req, cq_handle, 3172 + BNXT_RE_CQ_FIXED_NUM_CQE_ENABLE); 3173 + if (rc) 3174 + return rc; 3175 + 3176 + if (req.comp_mask & BNXT_RE_CQ_FIXED_NUM_CQE_ENABLE) 3177 + entries = attr->cqe; 3178 + else 3179 + entries = bnxt_re_init_depth(attr->cqe + 1, 3180 + dev_attr->max_cq_wqes + 1, uctx); 3181 + 3182 + if (!ibcq->umem) { 3183 + ibcq->umem = ib_umem_get(&rdev->ibdev, req.cq_va, 3184 + entries * sizeof(struct cq_base), 3185 + IB_ACCESS_LOCAL_WRITE); 3186 + if (IS_ERR(ibcq->umem)) 3187 + return PTR_ERR(ibcq->umem); 3188 + } 3189 + 3190 + rc = bnxt_re_setup_sginfo(rdev, ibcq->umem, &cq->qplib_cq.sg_info); 3191 + if (rc) 3192 + return rc; 3193 + 3194 + cq->qplib_cq.dpi = &uctx->dpi; 3195 + cq->qplib_cq.max_wqe = entries; 3196 + cq->qplib_cq.coalescing = &rdev->cq_coalescing; 3197 + cq->qplib_cq.nq = bnxt_re_get_nq(rdev); 3198 + cq->qplib_cq.cnq_hw_ring_id = cq->qplib_cq.nq->ring_id; 3199 + 3200 + rc = bnxt_qplib_create_cq(&rdev->qplib_res, &cq->qplib_cq); 3201 + if (rc) 3202 + return rc; 3203 + 3204 + cq->ib_cq.cqe = entries; 3205 + cq->cq_period = cq->qplib_cq.period; 3206 + active_cqs = atomic_inc_return(&rdev->stats.res.cq_count); 3207 + if (active_cqs > rdev->stats.res.cq_watermark) 3208 + rdev->stats.res.cq_watermark = active_cqs; 3209 + spin_lock_init(&cq->cq_lock); 3210 + 3211 + if (cctx->modes.toggle_bits & BNXT_QPLIB_CQ_TOGGLE_BIT) { 3212 + hash_add(rdev->cq_hash, &cq->hash_entry, cq->qplib_cq.id); 3213 + /* Allocate a page */ 3214 + cq->uctx_cq_page = (void *)get_zeroed_page(GFP_KERNEL); 3215 + if (!cq->uctx_cq_page) 3216 + return -ENOMEM; 3217 + 3218 + resp.comp_mask |= BNXT_RE_CQ_TOGGLE_PAGE_SUPPORT; 3219 + } 3220 + resp.cqid = cq->qplib_cq.id; 3221 + resp.tail = cq->qplib_cq.hwq.cons; 3222 + resp.phase = cq->qplib_cq.period; 3223 + rc = ib_respond_udata(udata, resp); 3224 + if (rc) { 3225 + bnxt_qplib_destroy_cq(&rdev->qplib_res, &cq->qplib_cq); 3226 + goto free_mem; 3227 + } 3228 + 3229 + return 0; 3230 + 3231 + free_mem: 3232 + free_page((unsigned long)cq->uctx_cq_page); 3233 + return rc; 3234 + } 3235 + 3236 + int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 3237 + struct uverbs_attr_bundle *attrs) 3238 + { 3239 + struct bnxt_re_cq *cq = container_of(ibcq, struct bnxt_re_cq, ib_cq); 3240 + struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibcq->device, ibdev); 3241 + struct bnxt_qplib_dev_attr *dev_attr = rdev->dev_attr; 3242 + int rc; 3380 3243 u32 active_cqs; 3381 3244 3382 3245 if (attr->flags) 3383 3246 return -EOPNOTSUPP; 3384 3247 3385 3248 /* Validate CQ fields */ 3386 - if (cqe < 1 || cqe > dev_attr->max_cq_wqes) { 3387 - ibdev_err(&rdev->ibdev, "Failed to create CQ -max exceeded"); 3249 + if (attr->cqe > dev_attr->max_cq_wqes) 3388 3250 return -EINVAL; 3389 - } 3390 3251 3391 3252 cq->rdev = rdev; 3392 - cctx = rdev->chip_ctx; 3393 3253 cq->qplib_cq.cq_handle = (u64)(unsigned long)(&cq->qplib_cq); 3394 3254 3395 - entries = bnxt_re_init_depth(cqe + 1, uctx); 3396 - if (entries > dev_attr->max_cq_wqes + 1) 3397 - entries = dev_attr->max_cq_wqes + 1; 3255 + cq->max_cql = attr->cqe + 1; 3256 + cq->cql = kzalloc_objs(struct bnxt_qplib_cqe, cq->max_cql); 3257 + if (!cq->cql) 3258 + return -ENOMEM; 3398 3259 3399 - cq->qplib_cq.sg_info.pgsize = PAGE_SIZE; 3400 - cq->qplib_cq.sg_info.pgshft = PAGE_SHIFT; 3401 - if (udata) { 3402 - struct bnxt_re_cq_req req; 3403 - if (ib_copy_from_udata(&req, udata, sizeof(req))) { 3404 - rc = -EFAULT; 3405 - goto fail; 3406 - } 3407 - 3408 - cq->umem = ib_umem_get(&rdev->ibdev, req.cq_va, 3409 - entries * sizeof(struct cq_base), 3410 - IB_ACCESS_LOCAL_WRITE); 3411 - if (IS_ERR(cq->umem)) { 3412 - rc = PTR_ERR(cq->umem); 3413 - goto fail; 3414 - } 3415 - cq->qplib_cq.sg_info.umem = cq->umem; 3416 - cq->qplib_cq.dpi = &uctx->dpi; 3417 - } else { 3418 - cq->max_cql = min_t(u32, entries, MAX_CQL_PER_POLL); 3419 - cq->cql = kzalloc_objs(struct bnxt_qplib_cqe, cq->max_cql); 3420 - if (!cq->cql) { 3421 - rc = -ENOMEM; 3422 - goto fail; 3423 - } 3424 - 3425 - cq->qplib_cq.dpi = &rdev->dpi_privileged; 3426 - } 3427 - cq->qplib_cq.max_wqe = entries; 3260 + cq->qplib_cq.sg_info.pgsize = SZ_4K; 3261 + cq->qplib_cq.sg_info.pgshft = __builtin_ctz(SZ_4K); 3262 + cq->qplib_cq.dpi = &rdev->dpi_privileged; 3263 + cq->qplib_cq.max_wqe = cq->max_cql; 3428 3264 cq->qplib_cq.coalescing = &rdev->cq_coalescing; 3429 3265 cq->qplib_cq.nq = bnxt_re_get_nq(rdev); 3430 3266 cq->qplib_cq.cnq_hw_ring_id = cq->qplib_cq.nq->ring_id; ··· 3494 3212 goto fail; 3495 3213 } 3496 3214 3497 - cq->ib_cq.cqe = entries; 3215 + cq->ib_cq.cqe = cq->max_cql; 3498 3216 cq->cq_period = cq->qplib_cq.period; 3499 - 3500 3217 active_cqs = atomic_inc_return(&rdev->stats.res.cq_count); 3501 3218 if (active_cqs > rdev->stats.res.cq_watermark) 3502 3219 rdev->stats.res.cq_watermark = active_cqs; 3503 3220 spin_lock_init(&cq->cq_lock); 3504 3221 3505 - if (udata) { 3506 - struct bnxt_re_cq_resp resp = {}; 3507 - 3508 - if (cctx->modes.toggle_bits & BNXT_QPLIB_CQ_TOGGLE_BIT) { 3509 - hash_add(rdev->cq_hash, &cq->hash_entry, cq->qplib_cq.id); 3510 - /* Allocate a page */ 3511 - cq->uctx_cq_page = (void *)get_zeroed_page(GFP_KERNEL); 3512 - if (!cq->uctx_cq_page) { 3513 - rc = -ENOMEM; 3514 - goto c2fail; 3515 - } 3516 - resp.comp_mask |= BNXT_RE_CQ_TOGGLE_PAGE_SUPPORT; 3517 - } 3518 - resp.cqid = cq->qplib_cq.id; 3519 - resp.tail = cq->qplib_cq.hwq.cons; 3520 - resp.phase = cq->qplib_cq.period; 3521 - resp.rsvd = 0; 3522 - rc = ib_copy_to_udata(udata, &resp, min(sizeof(resp), udata->outlen)); 3523 - if (rc) { 3524 - ibdev_err(&rdev->ibdev, "Failed to copy CQ udata"); 3525 - bnxt_qplib_destroy_cq(&rdev->qplib_res, &cq->qplib_cq); 3526 - goto free_mem; 3527 - } 3528 - } 3529 - 3530 3222 return 0; 3531 3223 3532 - free_mem: 3533 - free_page((unsigned long)cq->uctx_cq_page); 3534 - c2fail: 3535 - ib_umem_release(cq->umem); 3536 3224 fail: 3537 3225 kfree(cq->cql); 3538 3226 return rc; ··· 3516 3264 3517 3265 cq->qplib_cq.max_wqe = cq->resize_cqe; 3518 3266 if (cq->resize_umem) { 3519 - ib_umem_release(cq->umem); 3520 - cq->umem = cq->resize_umem; 3267 + ib_umem_release(cq->ib_cq.umem); 3268 + cq->ib_cq.umem = cq->resize_umem; 3521 3269 cq->resize_umem = NULL; 3522 3270 cq->resize_cqe = 0; 3523 3271 } 3524 3272 } 3525 3273 3526 - int bnxt_re_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) 3274 + int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe, 3275 + struct ib_udata *udata) 3527 3276 { 3528 3277 struct bnxt_qplib_sg_info sg_info = {}; 3529 3278 struct bnxt_qplib_dpi *orig_dpi = NULL; ··· 3533 3280 struct bnxt_re_resize_cq_req req; 3534 3281 struct bnxt_re_dev *rdev; 3535 3282 struct bnxt_re_cq *cq; 3536 - int rc, entries; 3283 + int rc; 3284 + u32 entries; 3537 3285 3538 3286 cq = container_of(ibcq, struct bnxt_re_cq, ib_cq); 3539 3287 rdev = cq->rdev; ··· 3551 3297 } 3552 3298 3553 3299 /* Check the requested cq depth out of supported depth */ 3554 - if (cqe < 1 || cqe > dev_attr->max_cq_wqes) { 3555 - ibdev_err(&rdev->ibdev, "Resize CQ %#x failed - out of range cqe %d", 3556 - cq->qplib_cq.id, cqe); 3300 + if (cqe > dev_attr->max_cq_wqes) 3557 3301 return -EINVAL; 3558 - } 3559 3302 3560 3303 uctx = rdma_udata_to_drv_context(udata, struct bnxt_re_ucontext, ib_uctx); 3561 - entries = bnxt_re_init_depth(cqe + 1, uctx); 3562 - if (entries > dev_attr->max_cq_wqes + 1) 3563 - entries = dev_attr->max_cq_wqes + 1; 3304 + entries = bnxt_re_init_depth(cqe + 1, dev_attr->max_cq_wqes + 1, uctx); 3564 3305 3565 3306 /* uverbs consumer */ 3566 - if (ib_copy_from_udata(&req, udata, sizeof(req))) { 3567 - rc = -EFAULT; 3307 + rc = ib_copy_validate_udata_in(udata, req, cq_va); 3308 + if (rc) 3568 3309 goto fail; 3569 - } 3570 3310 3571 3311 cq->resize_umem = ib_umem_get(&rdev->ibdev, req.cq_va, 3572 3312 entries * sizeof(struct cq_base), ··· 3591 3343 cq->ib_cq.cqe = cq->resize_cqe; 3592 3344 atomic_inc(&rdev->stats.res.resize_count); 3593 3345 3594 - return 0; 3346 + return ib_respond_empty_udata(udata); 3595 3347 3596 3348 fail: 3597 3349 if (cq->resize_umem) { ··· 4113 3865 /* User CQ; the only processing we do is to 4114 3866 * complete any pending CQ resize operation. 4115 3867 */ 4116 - if (cq->umem) { 3868 + if (cq->ib_cq.umem) { 4117 3869 if (cq->resize_umem) 4118 3870 bnxt_re_resize_cq_complete(cq); 4119 3871 return 0; ··· 4323 4075 struct bnxt_re_dev *rdev = mr->rdev; 4324 4076 int rc; 4325 4077 4078 + rc = ib_is_udata_in_empty(udata); 4079 + if (rc) 4080 + return rc; 4081 + 4326 4082 rc = bnxt_qplib_free_mrw(&rdev->qplib_res, &mr->qplib_mr); 4327 4083 if (rc) { 4328 4084 ibdev_err(&rdev->ibdev, "Dereg MR failed: %#x\n", rc); ··· 4344 4092 4345 4093 kfree(mr); 4346 4094 atomic_dec(&rdev->stats.res.mr_count); 4347 - return rc; 4095 + if (rc) 4096 + return rc; 4097 + return ib_respond_empty_udata(udata); 4348 4098 } 4349 4099 4350 4100 static int bnxt_re_set_page(struct ib_mr *ib_mr, u64 addr) ··· 4401 4147 mr->ib_mr.lkey = mr->qplib_mr.lkey; 4402 4148 mr->ib_mr.rkey = mr->ib_mr.lkey; 4403 4149 4404 - mr->pages = kcalloc(max_num_sg, sizeof(u64), GFP_KERNEL); 4150 + mr->pages = kzalloc_objs(u64, max_num_sg); 4405 4151 if (!mr->pages) { 4406 4152 rc = -ENOMEM; 4407 4153 goto fail; ··· 4436 4182 struct bnxt_re_mw *mw; 4437 4183 u32 active_mws; 4438 4184 int rc; 4185 + 4186 + rc = ib_is_udata_in_empty(udata); 4187 + if (rc) 4188 + return ERR_PTR(rc); 4439 4189 4440 4190 mw = kzalloc_obj(*mw); 4441 4191 if (!mw) ··· 4568 4310 struct bnxt_re_dev *rdev = pd->rdev; 4569 4311 struct ib_umem *umem; 4570 4312 struct ib_mr *ib_mr; 4313 + int ret; 4314 + 4315 + ret = ib_is_udata_in_empty(udata); 4316 + if (ret) 4317 + return ERR_PTR(ret); 4571 4318 4572 4319 if (dmah) 4573 4320 return ERR_PTR(-EOPNOTSUPP); ··· 4677 4414 if (_is_modify_qp_rate_limit_supported(dev_attr->dev_cap_flags2)) 4678 4415 resp.comp_mask |= BNXT_RE_UCNTX_CMASK_QP_RATE_LIMIT_ENABLED; 4679 4416 4680 - if (udata->inlen >= sizeof(ureq)) { 4681 - rc = ib_copy_from_udata(&ureq, udata, min(udata->inlen, sizeof(ureq))); 4417 + if (udata->inlen) { 4418 + rc = ib_copy_validate_udata_in_cm( 4419 + udata, ureq, comp_mask, 4420 + BNXT_RE_COMP_MASK_REQ_UCNTX_POW2_SUPPORT | 4421 + BNXT_RE_COMP_MASK_REQ_UCNTX_VAR_WQE_SUPPORT); 4682 4422 if (rc) 4683 4423 goto cfail; 4684 4424 if (ureq.comp_mask & BNXT_RE_COMP_MASK_REQ_UCNTX_POW2_SUPPORT) { ··· 4696 4430 } 4697 4431 } 4698 4432 4699 - rc = ib_copy_to_udata(udata, &resp, min(udata->outlen, sizeof(resp))); 4700 - if (rc) { 4701 - ibdev_err(ibdev, "Failed to copy user context"); 4702 - rc = -EFAULT; 4433 + rc = ib_respond_udata(udata, resp); 4434 + if (rc) 4703 4435 goto cfail; 4704 - } 4705 4436 4706 4437 return 0; 4707 4438 cfail: ··· 4756 4493 struct bnxt_re_dev *rdev = qp->rdev; 4757 4494 struct bnxt_re_flow *flow; 4758 4495 int rc; 4496 + 4497 + rc = ib_is_udata_in_empty(udata); 4498 + if (rc) 4499 + return ERR_PTR(rc); 4759 4500 4760 4501 if (attr->type != IB_FLOW_ATTR_SNIFFER || 4761 4502 !rdev->rcfw.roce_mirror) ··· 4819 4552 kfree(flow); 4820 4553 4821 4554 return rc; 4822 - } 4823 - 4824 - static struct bnxt_re_cq *bnxt_re_search_for_cq(struct bnxt_re_dev *rdev, u32 cq_id) 4825 - { 4826 - struct bnxt_re_cq *cq = NULL, *tmp_cq; 4827 - 4828 - hash_for_each_possible(rdev->cq_hash, tmp_cq, hash_entry, cq_id) { 4829 - if (tmp_cq->qplib_cq.id == cq_id) { 4830 - cq = tmp_cq; 4831 - break; 4832 - } 4833 - } 4834 - return cq; 4835 - } 4836 - 4837 - static struct bnxt_re_srq *bnxt_re_search_for_srq(struct bnxt_re_dev *rdev, u32 srq_id) 4838 - { 4839 - struct bnxt_re_srq *srq = NULL, *tmp_srq; 4840 - 4841 - hash_for_each_possible(rdev->srq_hash, tmp_srq, hash_entry, srq_id) { 4842 - if (tmp_srq->qplib_srq.id == srq_id) { 4843 - srq = tmp_srq; 4844 - break; 4845 - } 4846 - } 4847 - return srq; 4848 4555 } 4849 4556 4850 4557 /* Helper function to mmap the virtual memory from user app */ ··· 4923 4682 ret |= IB_MAD_RESULT_REPLY; 4924 4683 return ret; 4925 4684 } 4926 - 4927 - static int UVERBS_HANDLER(BNXT_RE_METHOD_NOTIFY_DRV)(struct uverbs_attr_bundle *attrs) 4928 - { 4929 - struct bnxt_re_ucontext *uctx; 4930 - 4931 - uctx = container_of(ib_uverbs_get_ucontext(attrs), struct bnxt_re_ucontext, ib_uctx); 4932 - bnxt_re_pacing_alert(uctx->rdev); 4933 - return 0; 4934 - } 4935 - 4936 - static int UVERBS_HANDLER(BNXT_RE_METHOD_ALLOC_PAGE)(struct uverbs_attr_bundle *attrs) 4937 - { 4938 - struct ib_uobject *uobj = uverbs_attr_get_uobject(attrs, BNXT_RE_ALLOC_PAGE_HANDLE); 4939 - enum bnxt_re_alloc_page_type alloc_type; 4940 - struct bnxt_re_user_mmap_entry *entry; 4941 - enum bnxt_re_mmap_flag mmap_flag; 4942 - struct bnxt_qplib_chip_ctx *cctx; 4943 - struct bnxt_re_ucontext *uctx; 4944 - struct bnxt_re_dev *rdev; 4945 - u64 mmap_offset; 4946 - u32 length; 4947 - u32 dpi; 4948 - u64 addr; 4949 - int err; 4950 - 4951 - uctx = container_of(ib_uverbs_get_ucontext(attrs), struct bnxt_re_ucontext, ib_uctx); 4952 - if (IS_ERR(uctx)) 4953 - return PTR_ERR(uctx); 4954 - 4955 - err = uverbs_get_const(&alloc_type, attrs, BNXT_RE_ALLOC_PAGE_TYPE); 4956 - if (err) 4957 - return err; 4958 - 4959 - rdev = uctx->rdev; 4960 - cctx = rdev->chip_ctx; 4961 - 4962 - switch (alloc_type) { 4963 - case BNXT_RE_ALLOC_WC_PAGE: 4964 - if (cctx->modes.db_push) { 4965 - if (bnxt_qplib_alloc_dpi(&rdev->qplib_res, &uctx->wcdpi, 4966 - uctx, BNXT_QPLIB_DPI_TYPE_WC)) 4967 - return -ENOMEM; 4968 - length = PAGE_SIZE; 4969 - dpi = uctx->wcdpi.dpi; 4970 - addr = (u64)uctx->wcdpi.umdbr; 4971 - mmap_flag = BNXT_RE_MMAP_WC_DB; 4972 - } else { 4973 - return -EINVAL; 4974 - } 4975 - 4976 - break; 4977 - case BNXT_RE_ALLOC_DBR_BAR_PAGE: 4978 - length = PAGE_SIZE; 4979 - addr = (u64)rdev->pacing.dbr_bar_addr; 4980 - mmap_flag = BNXT_RE_MMAP_DBR_BAR; 4981 - break; 4982 - 4983 - case BNXT_RE_ALLOC_DBR_PAGE: 4984 - length = PAGE_SIZE; 4985 - addr = (u64)rdev->pacing.dbr_page; 4986 - mmap_flag = BNXT_RE_MMAP_DBR_PAGE; 4987 - break; 4988 - 4989 - default: 4990 - return -EOPNOTSUPP; 4991 - } 4992 - 4993 - entry = bnxt_re_mmap_entry_insert(uctx, addr, mmap_flag, &mmap_offset); 4994 - if (!entry) 4995 - return -ENOMEM; 4996 - 4997 - uobj->object = entry; 4998 - uverbs_finalize_uobj_create(attrs, BNXT_RE_ALLOC_PAGE_HANDLE); 4999 - err = uverbs_copy_to(attrs, BNXT_RE_ALLOC_PAGE_MMAP_OFFSET, 5000 - &mmap_offset, sizeof(mmap_offset)); 5001 - if (err) 5002 - return err; 5003 - 5004 - err = uverbs_copy_to(attrs, BNXT_RE_ALLOC_PAGE_MMAP_LENGTH, 5005 - &length, sizeof(length)); 5006 - if (err) 5007 - return err; 5008 - 5009 - err = uverbs_copy_to(attrs, BNXT_RE_ALLOC_PAGE_DPI, 5010 - &dpi, sizeof(dpi)); 5011 - if (err) 5012 - return err; 5013 - 5014 - return 0; 5015 - } 5016 - 5017 - static int alloc_page_obj_cleanup(struct ib_uobject *uobject, 5018 - enum rdma_remove_reason why, 5019 - struct uverbs_attr_bundle *attrs) 5020 - { 5021 - struct bnxt_re_user_mmap_entry *entry = uobject->object; 5022 - struct bnxt_re_ucontext *uctx = entry->uctx; 5023 - 5024 - switch (entry->mmap_flag) { 5025 - case BNXT_RE_MMAP_WC_DB: 5026 - if (uctx && uctx->wcdpi.dbr) { 5027 - struct bnxt_re_dev *rdev = uctx->rdev; 5028 - 5029 - bnxt_qplib_dealloc_dpi(&rdev->qplib_res, &uctx->wcdpi); 5030 - uctx->wcdpi.dbr = NULL; 5031 - } 5032 - break; 5033 - case BNXT_RE_MMAP_DBR_BAR: 5034 - case BNXT_RE_MMAP_DBR_PAGE: 5035 - break; 5036 - default: 5037 - goto exit; 5038 - } 5039 - rdma_user_mmap_entry_remove(&entry->rdma_entry); 5040 - exit: 5041 - return 0; 5042 - } 5043 - 5044 - DECLARE_UVERBS_NAMED_METHOD(BNXT_RE_METHOD_ALLOC_PAGE, 5045 - UVERBS_ATTR_IDR(BNXT_RE_ALLOC_PAGE_HANDLE, 5046 - BNXT_RE_OBJECT_ALLOC_PAGE, 5047 - UVERBS_ACCESS_NEW, 5048 - UA_MANDATORY), 5049 - UVERBS_ATTR_CONST_IN(BNXT_RE_ALLOC_PAGE_TYPE, 5050 - enum bnxt_re_alloc_page_type, 5051 - UA_MANDATORY), 5052 - UVERBS_ATTR_PTR_OUT(BNXT_RE_ALLOC_PAGE_MMAP_OFFSET, 5053 - UVERBS_ATTR_TYPE(u64), 5054 - UA_MANDATORY), 5055 - UVERBS_ATTR_PTR_OUT(BNXT_RE_ALLOC_PAGE_MMAP_LENGTH, 5056 - UVERBS_ATTR_TYPE(u32), 5057 - UA_MANDATORY), 5058 - UVERBS_ATTR_PTR_OUT(BNXT_RE_ALLOC_PAGE_DPI, 5059 - UVERBS_ATTR_TYPE(u32), 5060 - UA_MANDATORY)); 5061 - 5062 - DECLARE_UVERBS_NAMED_METHOD_DESTROY(BNXT_RE_METHOD_DESTROY_PAGE, 5063 - UVERBS_ATTR_IDR(BNXT_RE_DESTROY_PAGE_HANDLE, 5064 - BNXT_RE_OBJECT_ALLOC_PAGE, 5065 - UVERBS_ACCESS_DESTROY, 5066 - UA_MANDATORY)); 5067 - 5068 - DECLARE_UVERBS_NAMED_OBJECT(BNXT_RE_OBJECT_ALLOC_PAGE, 5069 - UVERBS_TYPE_ALLOC_IDR(alloc_page_obj_cleanup), 5070 - &UVERBS_METHOD(BNXT_RE_METHOD_ALLOC_PAGE), 5071 - &UVERBS_METHOD(BNXT_RE_METHOD_DESTROY_PAGE)); 5072 - 5073 - DECLARE_UVERBS_NAMED_METHOD(BNXT_RE_METHOD_NOTIFY_DRV); 5074 - 5075 - DECLARE_UVERBS_GLOBAL_METHODS(BNXT_RE_OBJECT_NOTIFY_DRV, 5076 - &UVERBS_METHOD(BNXT_RE_METHOD_NOTIFY_DRV)); 5077 - 5078 - /* Toggle MEM */ 5079 - static int UVERBS_HANDLER(BNXT_RE_METHOD_GET_TOGGLE_MEM)(struct uverbs_attr_bundle *attrs) 5080 - { 5081 - struct ib_uobject *uobj = uverbs_attr_get_uobject(attrs, BNXT_RE_TOGGLE_MEM_HANDLE); 5082 - enum bnxt_re_mmap_flag mmap_flag = BNXT_RE_MMAP_TOGGLE_PAGE; 5083 - enum bnxt_re_get_toggle_mem_type res_type; 5084 - struct bnxt_re_user_mmap_entry *entry; 5085 - struct bnxt_re_ucontext *uctx; 5086 - struct ib_ucontext *ib_uctx; 5087 - struct bnxt_re_dev *rdev; 5088 - struct bnxt_re_srq *srq; 5089 - u32 length = PAGE_SIZE; 5090 - struct bnxt_re_cq *cq; 5091 - u64 mem_offset; 5092 - u32 offset = 0; 5093 - u64 addr = 0; 5094 - u32 res_id; 5095 - int err; 5096 - 5097 - ib_uctx = ib_uverbs_get_ucontext(attrs); 5098 - if (IS_ERR(ib_uctx)) 5099 - return PTR_ERR(ib_uctx); 5100 - 5101 - err = uverbs_get_const(&res_type, attrs, BNXT_RE_TOGGLE_MEM_TYPE); 5102 - if (err) 5103 - return err; 5104 - 5105 - uctx = container_of(ib_uctx, struct bnxt_re_ucontext, ib_uctx); 5106 - rdev = uctx->rdev; 5107 - err = uverbs_copy_from(&res_id, attrs, BNXT_RE_TOGGLE_MEM_RES_ID); 5108 - if (err) 5109 - return err; 5110 - 5111 - switch (res_type) { 5112 - case BNXT_RE_CQ_TOGGLE_MEM: 5113 - cq = bnxt_re_search_for_cq(rdev, res_id); 5114 - if (!cq) 5115 - return -EINVAL; 5116 - 5117 - addr = (u64)cq->uctx_cq_page; 5118 - break; 5119 - case BNXT_RE_SRQ_TOGGLE_MEM: 5120 - srq = bnxt_re_search_for_srq(rdev, res_id); 5121 - if (!srq) 5122 - return -EINVAL; 5123 - 5124 - addr = (u64)srq->uctx_srq_page; 5125 - break; 5126 - 5127 - default: 5128 - return -EOPNOTSUPP; 5129 - } 5130 - 5131 - entry = bnxt_re_mmap_entry_insert(uctx, addr, mmap_flag, &mem_offset); 5132 - if (!entry) 5133 - return -ENOMEM; 5134 - 5135 - uobj->object = entry; 5136 - uverbs_finalize_uobj_create(attrs, BNXT_RE_TOGGLE_MEM_HANDLE); 5137 - err = uverbs_copy_to(attrs, BNXT_RE_TOGGLE_MEM_MMAP_PAGE, 5138 - &mem_offset, sizeof(mem_offset)); 5139 - if (err) 5140 - return err; 5141 - 5142 - err = uverbs_copy_to(attrs, BNXT_RE_TOGGLE_MEM_MMAP_LENGTH, 5143 - &length, sizeof(length)); 5144 - if (err) 5145 - return err; 5146 - 5147 - err = uverbs_copy_to(attrs, BNXT_RE_TOGGLE_MEM_MMAP_OFFSET, 5148 - &offset, sizeof(offset)); 5149 - if (err) 5150 - return err; 5151 - 5152 - return 0; 5153 - } 5154 - 5155 - static int get_toggle_mem_obj_cleanup(struct ib_uobject *uobject, 5156 - enum rdma_remove_reason why, 5157 - struct uverbs_attr_bundle *attrs) 5158 - { 5159 - struct bnxt_re_user_mmap_entry *entry = uobject->object; 5160 - 5161 - rdma_user_mmap_entry_remove(&entry->rdma_entry); 5162 - return 0; 5163 - } 5164 - 5165 - DECLARE_UVERBS_NAMED_METHOD(BNXT_RE_METHOD_GET_TOGGLE_MEM, 5166 - UVERBS_ATTR_IDR(BNXT_RE_TOGGLE_MEM_HANDLE, 5167 - BNXT_RE_OBJECT_GET_TOGGLE_MEM, 5168 - UVERBS_ACCESS_NEW, 5169 - UA_MANDATORY), 5170 - UVERBS_ATTR_CONST_IN(BNXT_RE_TOGGLE_MEM_TYPE, 5171 - enum bnxt_re_get_toggle_mem_type, 5172 - UA_MANDATORY), 5173 - UVERBS_ATTR_PTR_IN(BNXT_RE_TOGGLE_MEM_RES_ID, 5174 - UVERBS_ATTR_TYPE(u32), 5175 - UA_MANDATORY), 5176 - UVERBS_ATTR_PTR_OUT(BNXT_RE_TOGGLE_MEM_MMAP_PAGE, 5177 - UVERBS_ATTR_TYPE(u64), 5178 - UA_MANDATORY), 5179 - UVERBS_ATTR_PTR_OUT(BNXT_RE_TOGGLE_MEM_MMAP_OFFSET, 5180 - UVERBS_ATTR_TYPE(u32), 5181 - UA_MANDATORY), 5182 - UVERBS_ATTR_PTR_OUT(BNXT_RE_TOGGLE_MEM_MMAP_LENGTH, 5183 - UVERBS_ATTR_TYPE(u32), 5184 - UA_MANDATORY)); 5185 - 5186 - DECLARE_UVERBS_NAMED_METHOD_DESTROY(BNXT_RE_METHOD_RELEASE_TOGGLE_MEM, 5187 - UVERBS_ATTR_IDR(BNXT_RE_RELEASE_TOGGLE_MEM_HANDLE, 5188 - BNXT_RE_OBJECT_GET_TOGGLE_MEM, 5189 - UVERBS_ACCESS_DESTROY, 5190 - UA_MANDATORY)); 5191 - 5192 - DECLARE_UVERBS_NAMED_OBJECT(BNXT_RE_OBJECT_GET_TOGGLE_MEM, 5193 - UVERBS_TYPE_ALLOC_IDR(get_toggle_mem_obj_cleanup), 5194 - &UVERBS_METHOD(BNXT_RE_METHOD_GET_TOGGLE_MEM), 5195 - &UVERBS_METHOD(BNXT_RE_METHOD_RELEASE_TOGGLE_MEM)); 5196 - 5197 - const struct uapi_definition bnxt_re_uapi_defs[] = { 5198 - UAPI_DEF_CHAIN_OBJ_TREE_NAMED(BNXT_RE_OBJECT_ALLOC_PAGE), 5199 - UAPI_DEF_CHAIN_OBJ_TREE_NAMED(BNXT_RE_OBJECT_NOTIFY_DRV), 5200 - UAPI_DEF_CHAIN_OBJ_TREE_NAMED(BNXT_RE_OBJECT_GET_TOGGLE_MEM), 5201 - {} 5202 - };

+20 -5

drivers/infiniband/hw/bnxt_re/ib_verbs.h

··· 108 108 struct bnxt_qplib_cqe *cql; 109 109 #define MAX_CQL_PER_POLL 1024 110 110 u32 max_cql; 111 - struct ib_umem *umem; 112 111 struct ib_umem *resize_umem; 113 112 int resize_cqe; 114 113 void *uctx_cq_page; ··· 163 164 u8 mmap_flag; 164 165 }; 165 166 167 + struct bnxt_re_dbr_obj { 168 + struct bnxt_re_dev *rdev; 169 + struct bnxt_qplib_dpi dpi; 170 + struct bnxt_re_user_mmap_entry *entry; 171 + atomic_t usecnt; /* QPs using this dbr */ 172 + }; 173 + 166 174 struct bnxt_re_flow { 167 175 struct ib_flow ib_flow; 168 176 struct bnxt_re_dev *rdev; ··· 190 184 BNXT_RE_UCNTX_CAP_VAR_WQE_ENABLED = 0x2ULL, 191 185 }; 192 186 193 - static inline u32 bnxt_re_init_depth(u32 ent, struct bnxt_re_ucontext *uctx) 187 + static inline u32 bnxt_re_init_depth(u32 ent, u32 max, 188 + struct bnxt_re_ucontext *uctx) 194 189 { 195 - return uctx ? (uctx->cmask & BNXT_RE_UCNTX_CAP_POW2_DISABLED) ? 196 - ent : roundup_pow_of_two(ent) : ent; 190 + if (uctx && !(uctx->cmask & BNXT_RE_UCNTX_CAP_POW2_DISABLED)) 191 + return min(roundup_pow_of_two(ent), max); 192 + 193 + return ent; 197 194 } 198 195 199 196 static inline bool bnxt_re_is_var_size_supported(struct bnxt_re_dev *rdev, ··· 256 247 const struct ib_recv_wr **bad_recv_wr); 257 248 int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 258 249 struct uverbs_attr_bundle *attrs); 259 - int bnxt_re_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata); 250 + int bnxt_re_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 251 + struct uverbs_attr_bundle *attrs); 252 + int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe, 253 + struct ib_udata *udata); 260 254 int bnxt_re_destroy_cq(struct ib_cq *cq, struct ib_udata *udata); 261 255 int bnxt_re_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc); 262 256 int bnxt_re_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags); ··· 305 293 306 294 unsigned long bnxt_re_lock_cqs(struct bnxt_re_qp *qp); 307 295 void bnxt_re_unlock_cqs(struct bnxt_re_qp *qp, unsigned long flags); 296 + struct bnxt_re_user_mmap_entry* 297 + bnxt_re_mmap_entry_insert(struct bnxt_re_ucontext *uctx, u64 mem_offset, 298 + enum bnxt_re_mmap_flag mmap_flag, u64 *offset); 308 299 #endif /* __BNXT_RE_IB_VERBS_H__ */

+3 -1

drivers/infiniband/hw/bnxt_re/main.c

··· 1326 1326 .owner = THIS_MODULE, 1327 1327 .driver_id = RDMA_DRIVER_BNXT_RE, 1328 1328 .uverbs_abi_ver = BNXT_RE_ABI_VERSION, 1329 + .uverbs_robust_udata = true, 1329 1330 1330 1331 .add_gid = bnxt_re_add_gid, 1331 1332 .alloc_hw_port_stats = bnxt_re_ib_alloc_hw_port_stats, ··· 1335 1334 .alloc_ucontext = bnxt_re_alloc_ucontext, 1336 1335 .create_ah = bnxt_re_create_ah, 1337 1336 .create_cq = bnxt_re_create_cq, 1337 + .create_user_cq = bnxt_re_create_user_cq, 1338 1338 .create_qp = bnxt_re_create_qp, 1339 1339 .create_srq = bnxt_re_create_srq, 1340 1340 .create_user_ah = bnxt_re_create_ah, ··· 1374 1372 .reg_user_mr = bnxt_re_reg_user_mr, 1375 1373 .reg_user_mr_dmabuf = bnxt_re_reg_user_mr_dmabuf, 1376 1374 .req_notify_cq = bnxt_re_req_notify_cq, 1377 - .resize_cq = bnxt_re_resize_cq, 1375 + .resize_user_cq = bnxt_re_resize_cq, 1378 1376 .create_flow = bnxt_re_create_flow, 1379 1377 .destroy_flow = bnxt_re_destroy_flow, 1380 1378 INIT_RDMA_OBJ_SIZE(ib_ah, bnxt_re_ah, ib_ah),

+96 -209

drivers/infiniband/hw/bnxt_re/qplib_fp.c

··· 792 792 return 0; 793 793 } 794 794 795 - /* QP */ 796 - 797 795 static int bnxt_qplib_alloc_init_swq(struct bnxt_qplib_q *que) 798 796 { 799 797 int indx; ··· 810 812 return 0; 811 813 } 812 814 815 + static int bnxt_re_setup_qp_swqs(struct bnxt_qplib_qp *qplqp) 816 + { 817 + struct bnxt_qplib_q *sq = &qplqp->sq; 818 + struct bnxt_qplib_q *rq = &qplqp->rq; 819 + int rc; 820 + 821 + if (qplqp->is_user) 822 + return 0; 823 + 824 + rc = bnxt_qplib_alloc_init_swq(sq); 825 + if (rc) 826 + return rc; 827 + 828 + if (!qplqp->srq) { 829 + rc = bnxt_qplib_alloc_init_swq(rq); 830 + if (rc) 831 + goto free_sq_swq; 832 + } 833 + 834 + return 0; 835 + free_sq_swq: 836 + kfree(sq->swq); 837 + return rc; 838 + } 839 + 840 + static void bnxt_qp_init_dbinfo(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp) 841 + { 842 + struct bnxt_qplib_q *sq = &qp->sq; 843 + struct bnxt_qplib_q *rq = &qp->rq; 844 + 845 + sq->dbinfo.hwq = &sq->hwq; 846 + sq->dbinfo.xid = qp->id; 847 + sq->dbinfo.db = qp->dpi->dbr; 848 + sq->dbinfo.max_slot = bnxt_qplib_set_sq_max_slot(qp->wqe_mode); 849 + sq->dbinfo.flags = 0; 850 + if (rq->max_wqe) { 851 + rq->dbinfo.hwq = &rq->hwq; 852 + rq->dbinfo.xid = qp->id; 853 + rq->dbinfo.db = qp->dpi->dbr; 854 + rq->dbinfo.max_slot = bnxt_qplib_set_rq_max_slot(rq->wqe_size); 855 + rq->dbinfo.flags = 0; 856 + } 857 + } 858 + 859 + static void bnxt_qplib_init_psn_ptr(struct bnxt_qplib_qp *qp, int size) 860 + { 861 + struct bnxt_qplib_hwq *sq_hwq; 862 + struct bnxt_qplib_q *sq; 863 + u64 fpsne, psn_pg; 864 + u16 indx_pad = 0; 865 + 866 + sq = &qp->sq; 867 + sq_hwq = &sq->hwq; 868 + /* First psn entry */ 869 + fpsne = (u64)bnxt_qplib_get_qe(sq_hwq, sq_hwq->depth, &psn_pg); 870 + if (!IS_ALIGNED(fpsne, PAGE_SIZE)) 871 + indx_pad = (fpsne & ~PAGE_MASK) / size; 872 + sq_hwq->pad_pgofft = indx_pad; 873 + sq_hwq->pad_pg = (u64 *)psn_pg; 874 + sq_hwq->pad_stride = size; 875 + } 876 + 877 + /* QP */ 813 878 int bnxt_qplib_create_qp1(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp) 814 879 { 815 - struct bnxt_qplib_hwq_attr hwq_attr = {}; 816 880 struct bnxt_qplib_rcfw *rcfw = res->rcfw; 817 881 struct creq_create_qp1_resp resp = {}; 818 882 struct bnxt_qplib_cmdqmsg msg = {}; ··· 883 823 struct cmdq_create_qp1 req = {}; 884 824 struct bnxt_qplib_pbl *pbl; 885 825 u32 qp_flags = 0; 886 - u8 pg_sz_lvl; 887 826 u32 tbl_indx; 888 827 int rc; 889 828 ··· 896 837 req.qp_handle = cpu_to_le64(qp->qp_handle); 897 838 898 839 /* SQ */ 899 - hwq_attr.res = res; 900 - hwq_attr.sginfo = &sq->sg_info; 901 - hwq_attr.stride = sizeof(struct sq_sge); 902 - hwq_attr.depth = bnxt_qplib_get_depth(sq, qp->wqe_mode, false); 903 - hwq_attr.type = HWQ_TYPE_QUEUE; 904 - rc = bnxt_qplib_alloc_init_hwq(&sq->hwq, &hwq_attr); 905 - if (rc) 906 - return rc; 840 + sq->max_sw_wqe = bnxt_qplib_get_depth(sq, qp->wqe_mode, true); 841 + req.sq_size = cpu_to_le32(sq->max_sw_wqe); 842 + req.sq_pg_size_sq_lvl = sq->hwq.pg_sz_lvl; 907 843 908 - rc = bnxt_qplib_alloc_init_swq(sq); 909 - if (rc) 910 - goto fail_sq; 911 - 912 - req.sq_size = cpu_to_le32(bnxt_qplib_set_sq_size(sq, qp->wqe_mode)); 913 844 pbl = &sq->hwq.pbl[PBL_LVL_0]; 914 845 req.sq_pbl = cpu_to_le64(pbl->pg_map_arr[0]); 915 - pg_sz_lvl = (bnxt_qplib_base_pg_size(&sq->hwq) << 916 - CMDQ_CREATE_QP1_SQ_PG_SIZE_SFT); 917 - pg_sz_lvl |= (sq->hwq.level & CMDQ_CREATE_QP1_SQ_LVL_MASK); 918 - req.sq_pg_size_sq_lvl = pg_sz_lvl; 919 846 req.sq_fwo_sq_sge = 920 847 cpu_to_le16((sq->max_sge & CMDQ_CREATE_QP1_SQ_SGE_MASK) << 921 848 CMDQ_CREATE_QP1_SQ_SGE_SFT); ··· 910 865 /* RQ */ 911 866 if (rq->max_wqe) { 912 867 rq->dbinfo.flags = 0; 913 - hwq_attr.res = res; 914 - hwq_attr.sginfo = &rq->sg_info; 915 - hwq_attr.stride = sizeof(struct sq_sge); 916 - hwq_attr.depth = bnxt_qplib_get_depth(rq, qp->wqe_mode, false); 917 - hwq_attr.type = HWQ_TYPE_QUEUE; 918 - rc = bnxt_qplib_alloc_init_hwq(&rq->hwq, &hwq_attr); 919 - if (rc) 920 - goto sq_swq; 921 - rc = bnxt_qplib_alloc_init_swq(rq); 922 - if (rc) 923 - goto fail_rq; 924 868 req.rq_size = cpu_to_le32(rq->max_wqe); 925 869 pbl = &rq->hwq.pbl[PBL_LVL_0]; 926 870 req.rq_pbl = cpu_to_le64(pbl->pg_map_arr[0]); 927 - pg_sz_lvl = (bnxt_qplib_base_pg_size(&rq->hwq) << 928 - CMDQ_CREATE_QP1_RQ_PG_SIZE_SFT); 929 - pg_sz_lvl |= (rq->hwq.level & CMDQ_CREATE_QP1_RQ_LVL_MASK); 930 - req.rq_pg_size_rq_lvl = pg_sz_lvl; 871 + req.rq_pg_size_rq_lvl = rq->hwq.pg_sz_lvl; 931 872 req.rq_fwo_rq_sge = 932 873 cpu_to_le16((rq->max_sge & 933 874 CMDQ_CREATE_QP1_RQ_SGE_MASK) << ··· 924 893 rc = bnxt_qplib_alloc_qp_hdr_buf(res, qp); 925 894 if (rc) { 926 895 rc = -ENOMEM; 927 - goto rq_rwq; 896 + return rc; 928 897 } 929 898 qp_flags |= CMDQ_CREATE_QP1_QP_FLAGS_RESERVED_LKEY_ENABLE; 930 899 req.qp_flags = cpu_to_le32(qp_flags); ··· 937 906 938 907 qp->id = le32_to_cpu(resp.xid); 939 908 qp->cur_qp_state = CMDQ_MODIFY_QP_NEW_STATE_RESET; 940 - qp->cctx = res->cctx; 941 - sq->dbinfo.hwq = &sq->hwq; 942 - sq->dbinfo.xid = qp->id; 943 - sq->dbinfo.db = qp->dpi->dbr; 944 - sq->dbinfo.max_slot = bnxt_qplib_set_sq_max_slot(qp->wqe_mode); 945 - if (rq->max_wqe) { 946 - rq->dbinfo.hwq = &rq->hwq; 947 - rq->dbinfo.xid = qp->id; 948 - rq->dbinfo.db = qp->dpi->dbr; 949 - rq->dbinfo.max_slot = bnxt_qplib_set_rq_max_slot(rq->wqe_size); 950 - } 909 + 910 + rc = bnxt_re_setup_qp_swqs(qp); 911 + if (rc) 912 + goto destroy_qp; 913 + bnxt_qp_init_dbinfo(res, qp); 914 + 951 915 tbl_indx = map_qp_id_to_tbl_indx(qp->id, rcfw); 952 916 rcfw->qp_tbl[tbl_indx].qp_id = qp->id; 953 917 rcfw->qp_tbl[tbl_indx].qp_handle = (void *)qp; 954 918 955 919 return 0; 956 920 921 + destroy_qp: 922 + bnxt_qplib_destroy_qp(res, qp); 957 923 fail: 958 924 bnxt_qplib_free_qp_hdr_buf(res, qp); 959 - rq_rwq: 960 - kfree(rq->swq); 961 - fail_rq: 962 - bnxt_qplib_free_hwq(res, &rq->hwq); 963 - sq_swq: 964 - kfree(sq->swq); 965 - fail_sq: 966 - bnxt_qplib_free_hwq(res, &sq->hwq); 967 925 return rc; 968 - } 969 - 970 - static void bnxt_qplib_init_psn_ptr(struct bnxt_qplib_qp *qp, int size) 971 - { 972 - struct bnxt_qplib_hwq *hwq; 973 - struct bnxt_qplib_q *sq; 974 - u64 fpsne, psn_pg; 975 - u16 indx_pad = 0; 976 - 977 - sq = &qp->sq; 978 - hwq = &sq->hwq; 979 - /* First psn entry */ 980 - fpsne = (u64)bnxt_qplib_get_qe(hwq, hwq->depth, &psn_pg); 981 - if (!IS_ALIGNED(fpsne, PAGE_SIZE)) 982 - indx_pad = (fpsne & ~PAGE_MASK) / size; 983 - hwq->pad_pgofft = indx_pad; 984 - hwq->pad_pg = (u64 *)psn_pg; 985 - hwq->pad_stride = size; 986 926 } 987 927 988 928 int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp) 989 929 { 990 930 struct bnxt_qplib_rcfw *rcfw = res->rcfw; 991 - struct bnxt_qplib_hwq_attr hwq_attr = {}; 992 - struct bnxt_qplib_sg_info sginfo = {}; 993 931 struct creq_create_qp_resp resp = {}; 994 932 struct bnxt_qplib_cmdqmsg msg = {}; 995 933 struct bnxt_qplib_q *sq = &qp->sq; 996 934 struct bnxt_qplib_q *rq = &qp->rq; 997 935 struct cmdq_create_qp req = {}; 998 - int rc, req_size, psn_sz = 0; 999 - struct bnxt_qplib_hwq *xrrq; 1000 936 struct bnxt_qplib_pbl *pbl; 1001 937 u32 qp_flags = 0; 1002 - u8 pg_sz_lvl; 1003 938 u32 tbl_indx; 1004 939 u16 nsge; 940 + int rc; 1005 941 1006 - qp->is_host_msn_tbl = _is_host_msn_table(res->dattr->dev_cap_flags2); 1007 942 sq->dbinfo.flags = 0; 1008 943 bnxt_qplib_rcfw_cmd_prep((struct cmdq_base *)&req, 1009 944 CMDQ_BASE_OPCODE_CREATE_QP, ··· 981 984 req.qp_handle = cpu_to_le64(qp->qp_handle); 982 985 983 986 /* SQ */ 984 - if (qp->type == CMDQ_CREATE_QP_TYPE_RC) { 985 - psn_sz = bnxt_qplib_is_chip_gen_p5_p7(res->cctx) ? 986 - sizeof(struct sq_psn_search_ext) : 987 - sizeof(struct sq_psn_search); 988 - 989 - if (qp->is_host_msn_tbl) { 990 - psn_sz = sizeof(struct sq_msn_search); 991 - qp->msn = 0; 992 - } 993 - } 994 - 995 - hwq_attr.res = res; 996 - hwq_attr.sginfo = &sq->sg_info; 997 - hwq_attr.stride = sizeof(struct sq_sge); 998 - hwq_attr.depth = bnxt_qplib_get_depth(sq, qp->wqe_mode, true); 999 - hwq_attr.aux_stride = psn_sz; 1000 - hwq_attr.aux_depth = psn_sz ? bnxt_qplib_set_sq_size(sq, qp->wqe_mode) 1001 - : 0; 1002 - /* Update msn tbl size */ 1003 - if (qp->is_host_msn_tbl && psn_sz) { 1004 - if (qp->wqe_mode == BNXT_QPLIB_WQE_MODE_STATIC) 1005 - hwq_attr.aux_depth = 1006 - roundup_pow_of_two(bnxt_qplib_set_sq_size(sq, qp->wqe_mode)); 1007 - else 1008 - hwq_attr.aux_depth = 1009 - roundup_pow_of_two(bnxt_qplib_set_sq_size(sq, qp->wqe_mode)) / 2; 1010 - qp->msn_tbl_sz = hwq_attr.aux_depth; 1011 - qp->msn = 0; 1012 - } 1013 - 1014 - hwq_attr.type = HWQ_TYPE_QUEUE; 1015 - rc = bnxt_qplib_alloc_init_hwq(&sq->hwq, &hwq_attr); 1016 - if (rc) 1017 - return rc; 1018 - 1019 - if (!sq->hwq.is_user) { 1020 - rc = bnxt_qplib_alloc_init_swq(sq); 1021 - if (rc) 1022 - goto fail_sq; 1023 - 1024 - if (psn_sz) 1025 - bnxt_qplib_init_psn_ptr(qp, psn_sz); 1026 - } 1027 - req.sq_size = cpu_to_le32(bnxt_qplib_set_sq_size(sq, qp->wqe_mode)); 987 + req.sq_size = cpu_to_le32(sq->max_sw_wqe); 1028 988 pbl = &sq->hwq.pbl[PBL_LVL_0]; 1029 989 req.sq_pbl = cpu_to_le64(pbl->pg_map_arr[0]); 1030 - pg_sz_lvl = (bnxt_qplib_base_pg_size(&sq->hwq) << 1031 - CMDQ_CREATE_QP_SQ_PG_SIZE_SFT); 1032 - pg_sz_lvl |= (sq->hwq.level & CMDQ_CREATE_QP_SQ_LVL_MASK); 1033 - req.sq_pg_size_sq_lvl = pg_sz_lvl; 990 + req.sq_pg_size_sq_lvl = sq->hwq.pg_sz_lvl; 1034 991 req.sq_fwo_sq_sge = 1035 992 cpu_to_le16(((sq->max_sge & CMDQ_CREATE_QP_SQ_SGE_MASK) << 1036 993 CMDQ_CREATE_QP_SQ_SGE_SFT) | 0); ··· 993 1042 /* RQ */ 994 1043 if (!qp->srq) { 995 1044 rq->dbinfo.flags = 0; 996 - hwq_attr.res = res; 997 - hwq_attr.sginfo = &rq->sg_info; 998 - hwq_attr.stride = sizeof(struct sq_sge); 999 - hwq_attr.depth = bnxt_qplib_get_depth(rq, qp->wqe_mode, false); 1000 - hwq_attr.aux_stride = 0; 1001 - hwq_attr.aux_depth = 0; 1002 - hwq_attr.type = HWQ_TYPE_QUEUE; 1003 - rc = bnxt_qplib_alloc_init_hwq(&rq->hwq, &hwq_attr); 1004 - if (rc) 1005 - goto sq_swq; 1006 - if (!rq->hwq.is_user) { 1007 - rc = bnxt_qplib_alloc_init_swq(rq); 1008 - if (rc) 1009 - goto fail_rq; 1010 - } 1011 - 1012 1045 req.rq_size = cpu_to_le32(rq->max_wqe); 1013 1046 pbl = &rq->hwq.pbl[PBL_LVL_0]; 1014 1047 req.rq_pbl = cpu_to_le64(pbl->pg_map_arr[0]); 1015 - pg_sz_lvl = (bnxt_qplib_base_pg_size(&rq->hwq) << 1016 - CMDQ_CREATE_QP_RQ_PG_SIZE_SFT); 1017 - pg_sz_lvl |= (rq->hwq.level & CMDQ_CREATE_QP_RQ_LVL_MASK); 1018 - req.rq_pg_size_rq_lvl = pg_sz_lvl; 1048 + req.rq_pg_size_rq_lvl = rq->hwq.pg_sz_lvl; 1019 1049 nsge = (qp->wqe_mode == BNXT_QPLIB_WQE_MODE_STATIC) ? 1020 1050 6 : rq->max_sge; 1021 1051 req.rq_fwo_rq_sge = ··· 1022 1090 req.qp_flags = cpu_to_le32(qp_flags); 1023 1091 1024 1092 /* ORRQ and IRRQ */ 1025 - if (psn_sz) { 1026 - xrrq = &qp->orrq; 1027 - xrrq->max_elements = 1028 - ORD_LIMIT_TO_ORRQ_SLOTS(qp->max_rd_atomic); 1029 - req_size = xrrq->max_elements * 1030 - BNXT_QPLIB_MAX_ORRQE_ENTRY_SIZE + PAGE_SIZE - 1; 1031 - req_size &= ~(PAGE_SIZE - 1); 1032 - sginfo.pgsize = req_size; 1033 - sginfo.pgshft = PAGE_SHIFT; 1034 - 1035 - hwq_attr.res = res; 1036 - hwq_attr.sginfo = &sginfo; 1037 - hwq_attr.depth = xrrq->max_elements; 1038 - hwq_attr.stride = BNXT_QPLIB_MAX_ORRQE_ENTRY_SIZE; 1039 - hwq_attr.aux_stride = 0; 1040 - hwq_attr.aux_depth = 0; 1041 - hwq_attr.type = HWQ_TYPE_CTX; 1042 - rc = bnxt_qplib_alloc_init_hwq(xrrq, &hwq_attr); 1043 - if (rc) 1044 - goto rq_swq; 1045 - pbl = &xrrq->pbl[PBL_LVL_0]; 1046 - req.orrq_addr = cpu_to_le64(pbl->pg_map_arr[0]); 1047 - 1048 - xrrq = &qp->irrq; 1049 - xrrq->max_elements = IRD_LIMIT_TO_IRRQ_SLOTS( 1050 - qp->max_dest_rd_atomic); 1051 - req_size = xrrq->max_elements * 1052 - BNXT_QPLIB_MAX_IRRQE_ENTRY_SIZE + PAGE_SIZE - 1; 1053 - req_size &= ~(PAGE_SIZE - 1); 1054 - sginfo.pgsize = req_size; 1055 - hwq_attr.depth = xrrq->max_elements; 1056 - hwq_attr.stride = BNXT_QPLIB_MAX_IRRQE_ENTRY_SIZE; 1057 - rc = bnxt_qplib_alloc_init_hwq(xrrq, &hwq_attr); 1058 - if (rc) 1059 - goto fail_orrq; 1060 - 1061 - pbl = &xrrq->pbl[PBL_LVL_0]; 1062 - req.irrq_addr = cpu_to_le64(pbl->pg_map_arr[0]); 1093 + if (qp->psn_sz) { 1094 + req.orrq_addr = cpu_to_le64(bnxt_qplib_get_base_addr(&qp->orrq)); 1095 + req.irrq_addr = cpu_to_le64(bnxt_qplib_get_base_addr(&qp->irrq)); 1063 1096 } 1064 1097 req.pd_id = cpu_to_le32(qp->pd->id); 1065 1098 ··· 1032 1135 sizeof(resp), 0); 1033 1136 rc = bnxt_qplib_rcfw_send_message(rcfw, &msg); 1034 1137 if (rc) 1035 - goto fail; 1138 + return rc; 1036 1139 1037 1140 qp->id = le32_to_cpu(resp.xid); 1141 + 1142 + if (!qp->is_user) { 1143 + rc = bnxt_re_setup_qp_swqs(qp); 1144 + if (rc) 1145 + goto destroy_qp; 1146 + } 1147 + bnxt_qp_init_dbinfo(res, qp); 1148 + if (qp->psn_sz) 1149 + bnxt_qplib_init_psn_ptr(qp, qp->psn_sz); 1150 + 1038 1151 qp->cur_qp_state = CMDQ_MODIFY_QP_NEW_STATE_RESET; 1039 1152 INIT_LIST_HEAD(&qp->sq_flush); 1040 1153 INIT_LIST_HEAD(&qp->rq_flush); 1041 1154 qp->cctx = res->cctx; 1042 - sq->dbinfo.hwq = &sq->hwq; 1043 - sq->dbinfo.xid = qp->id; 1044 - sq->dbinfo.db = qp->dpi->dbr; 1045 - sq->dbinfo.max_slot = bnxt_qplib_set_sq_max_slot(qp->wqe_mode); 1046 - if (rq->max_wqe) { 1047 - rq->dbinfo.hwq = &rq->hwq; 1048 - rq->dbinfo.xid = qp->id; 1049 - rq->dbinfo.db = qp->dpi->dbr; 1050 - rq->dbinfo.max_slot = bnxt_qplib_set_rq_max_slot(rq->wqe_size); 1051 - } 1052 1155 spin_lock_bh(&rcfw->tbl_lock); 1053 1156 tbl_indx = map_qp_id_to_tbl_indx(qp->id, rcfw); 1054 1157 rcfw->qp_tbl[tbl_indx].qp_id = qp->id; ··· 1056 1159 spin_unlock_bh(&rcfw->tbl_lock); 1057 1160 1058 1161 return 0; 1059 - fail: 1060 - bnxt_qplib_free_hwq(res, &qp->irrq); 1061 - fail_orrq: 1062 - bnxt_qplib_free_hwq(res, &qp->orrq); 1063 - rq_swq: 1064 - kfree(rq->swq); 1065 - fail_rq: 1066 - bnxt_qplib_free_hwq(res, &rq->hwq); 1067 - sq_swq: 1068 - kfree(sq->swq); 1069 - fail_sq: 1070 - bnxt_qplib_free_hwq(res, &sq->hwq); 1162 + destroy_qp: 1163 + bnxt_qplib_destroy_qp(res, qp); 1071 1164 return rc; 1072 1165 } 1073 1166

+8

drivers/infiniband/hw/bnxt_re/qplib_fp.h

··· 279 279 u8 wqe_mode; 280 280 u8 state; 281 281 u8 cur_qp_state; 282 + u8 is_user; 282 283 u64 modify_flags; 283 284 u32 ext_modify_flags; 284 285 u32 max_inline_data; ··· 345 344 struct list_head rq_flush; 346 345 u32 msn; 347 346 u32 msn_tbl_sz; 347 + u32 psn_sz; 348 348 bool is_host_msn_tbl; 349 349 u8 tos_dscp; 350 350 u32 ugid_index; 351 + u16 dev_cap_flags; 351 352 u32 rate_limit; 352 353 u8 shaper_allocation_status; 353 354 }; ··· 618 615 static inline void bnxt_qplib_swq_mod_start(struct bnxt_qplib_q *que, u32 idx) 619 616 { 620 617 que->swq_start = que->swq[idx].next_idx; 618 + } 619 + 620 + static inline u32 bnxt_qplib_get_stride(void) 621 + { 622 + return sizeof(struct sq_sge); 621 623 } 622 624 623 625 static inline u32 bnxt_qplib_get_depth(struct bnxt_qplib_q *que, u8 wqe_mode, bool is_sq)

+45 -2

drivers/infiniband/hw/bnxt_re/qplib_res.c

··· 46 46 #include <linux/if_vlan.h> 47 47 #include <linux/vmalloc.h> 48 48 #include <rdma/ib_verbs.h> 49 - #include <rdma/ib_umem.h> 49 + #include <rdma/iter.h> 50 50 51 51 #include "roce_hsi.h" 52 52 #include "qplib_res.h" ··· 683 683 } 684 684 685 685 /* DPIs */ 686 + int bnxt_qplib_alloc_uc_dpi(struct bnxt_qplib_res *res, struct bnxt_qplib_dpi *dpi) 687 + { 688 + struct bnxt_qplib_dpi_tbl *dpit = &res->dpi_tbl; 689 + struct bnxt_qplib_reg_desc *reg; 690 + u32 bit_num; 691 + int rc = 0; 692 + 693 + reg = &dpit->wcreg; 694 + mutex_lock(&res->dpi_tbl_lock); 695 + bit_num = find_first_bit(dpit->tbl, dpit->max); 696 + if (bit_num >= dpit->max) { 697 + rc = -ENOMEM; 698 + goto unlock; 699 + } 700 + /* Found unused DPI */ 701 + clear_bit(bit_num, dpit->tbl); 702 + dpi->bit = bit_num; 703 + dpi->dpi = bit_num + (reg->offset - dpit->ucreg.offset) / PAGE_SIZE; 704 + dpi->umdbr = reg->bar_base + reg->offset + bit_num * PAGE_SIZE; 705 + unlock: 706 + mutex_unlock(&res->dpi_tbl_lock); 707 + return rc; 708 + } 709 + 710 + int bnxt_qplib_free_uc_dpi(struct bnxt_qplib_res *res, struct bnxt_qplib_dpi *dpi) 711 + { 712 + struct bnxt_qplib_dpi_tbl *dpit = &res->dpi_tbl; 713 + int rc = 0; 714 + 715 + mutex_lock(&res->dpi_tbl_lock); 716 + if (dpi->bit >= dpit->max) { 717 + rc = -EINVAL; 718 + goto unlock; 719 + } 720 + 721 + if (test_and_set_bit(dpi->bit, dpit->tbl)) 722 + rc = -EINVAL; 723 + memset(dpi, 0, sizeof(*dpi)); 724 + unlock: 725 + mutex_unlock(&res->dpi_tbl_lock); 726 + return rc; 727 + } 728 + 686 729 int bnxt_qplib_alloc_dpi(struct bnxt_qplib_res *res, 687 730 struct bnxt_qplib_dpi *dpi, 688 731 void *app, u8 type) ··· 833 790 if (dev_attr->max_dpi) 834 791 dpit->max = min_t(u32, dpit->max, dev_attr->max_dpi); 835 792 836 - dpit->app_tbl = kcalloc(dpit->max, sizeof(void *), GFP_KERNEL); 793 + dpit->app_tbl = kzalloc_objs(void *, dpit->max); 837 794 if (!dpit->app_tbl) 838 795 return -ENOMEM; 839 796

+10

drivers/infiniband/hw/bnxt_re/qplib_res.h

··· 198 198 u32 cons; /* raw */ 199 199 u8 cp_bit; 200 200 u8 is_user; 201 + u8 pg_sz_lvl; 201 202 u64 *pad_pg; 202 203 u32 pad_stride; 203 204 u32 pad_pgofft; ··· 359 358 RING_ALLOC_REQ_RING_TYPE_ROCE_CMPL; 360 359 } 361 360 361 + static inline u64 bnxt_qplib_get_base_addr(struct bnxt_qplib_hwq *hwq) 362 + { 363 + return hwq->pbl[PBL_LVL_0].pg_map_arr[0]; 364 + } 365 + 362 366 static inline u8 bnxt_qplib_base_pg_size(struct bnxt_qplib_hwq *hwq) 363 367 { 364 368 u8 pg_size = BNXT_QPLIB_HWRM_PG_SIZE_4K; ··· 435 429 struct bnxt_qplib_dpi *dpi, 436 430 void *app, u8 type); 437 431 int bnxt_qplib_dealloc_dpi(struct bnxt_qplib_res *res, 432 + struct bnxt_qplib_dpi *dpi); 433 + int bnxt_qplib_alloc_uc_dpi(struct bnxt_qplib_res *res, 434 + struct bnxt_qplib_dpi *dpi); 435 + int bnxt_qplib_free_uc_dpi(struct bnxt_qplib_res *res, 438 436 struct bnxt_qplib_dpi *dpi); 439 437 void bnxt_qplib_cleanup_res(struct bnxt_qplib_res *res); 440 438 int bnxt_qplib_init_res(struct bnxt_qplib_res *res);

+469

drivers/infiniband/hw/bnxt_re/uapi.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause 2 + /* 3 + * Copyright (c) 2025, Broadcom. All rights reserved. The term 4 + * Broadcom refers to Broadcom Limited and/or its subsidiaries. 5 + * 6 + * Description: uapi interpreter 7 + */ 8 + 9 + #include <rdma/ib_addr.h> 10 + #include <rdma/uverbs_types.h> 11 + #include <rdma/uverbs_std_types.h> 12 + #include <rdma/ib_user_ioctl_cmds.h> 13 + #define UVERBS_MODULE_NAME bnxt_re 14 + #include <rdma/uverbs_named_ioctl.h> 15 + #include <rdma/bnxt_re-abi.h> 16 + 17 + #include "roce_hsi.h" 18 + #include "qplib_res.h" 19 + #include "qplib_sp.h" 20 + #include "qplib_fp.h" 21 + #include "qplib_rcfw.h" 22 + #include "bnxt_re.h" 23 + #include "ib_verbs.h" 24 + 25 + static struct bnxt_re_cq *bnxt_re_search_for_cq(struct bnxt_re_dev *rdev, u32 cq_id) 26 + { 27 + struct bnxt_re_cq *cq = NULL, *tmp_cq; 28 + 29 + hash_for_each_possible(rdev->cq_hash, tmp_cq, hash_entry, cq_id) { 30 + if (tmp_cq->qplib_cq.id == cq_id) { 31 + cq = tmp_cq; 32 + break; 33 + } 34 + } 35 + return cq; 36 + } 37 + 38 + static struct bnxt_re_srq *bnxt_re_search_for_srq(struct bnxt_re_dev *rdev, u32 srq_id) 39 + { 40 + struct bnxt_re_srq *srq = NULL, *tmp_srq; 41 + 42 + hash_for_each_possible(rdev->srq_hash, tmp_srq, hash_entry, srq_id) { 43 + if (tmp_srq->qplib_srq.id == srq_id) { 44 + srq = tmp_srq; 45 + break; 46 + } 47 + } 48 + return srq; 49 + } 50 + 51 + static int UVERBS_HANDLER(BNXT_RE_METHOD_NOTIFY_DRV)(struct uverbs_attr_bundle *attrs) 52 + { 53 + struct bnxt_re_ucontext *uctx; 54 + struct ib_ucontext *ib_uctx; 55 + 56 + ib_uctx = ib_uverbs_get_ucontext(attrs); 57 + if (IS_ERR(ib_uctx)) 58 + return PTR_ERR(ib_uctx); 59 + 60 + uctx = container_of(ib_uctx, struct bnxt_re_ucontext, ib_uctx); 61 + if (IS_ERR(uctx)) 62 + return PTR_ERR(uctx); 63 + 64 + bnxt_re_pacing_alert(uctx->rdev); 65 + return 0; 66 + } 67 + 68 + static int UVERBS_HANDLER(BNXT_RE_METHOD_ALLOC_PAGE)(struct uverbs_attr_bundle *attrs) 69 + { 70 + struct ib_uobject *uobj = uverbs_attr_get_uobject(attrs, BNXT_RE_ALLOC_PAGE_HANDLE); 71 + enum bnxt_re_alloc_page_type alloc_type; 72 + struct bnxt_re_user_mmap_entry *entry; 73 + enum bnxt_re_mmap_flag mmap_flag; 74 + struct bnxt_qplib_chip_ctx *cctx; 75 + struct bnxt_re_ucontext *uctx; 76 + struct ib_ucontext *ib_uctx; 77 + struct bnxt_re_dev *rdev; 78 + u64 mmap_offset; 79 + u32 length; 80 + u32 dpi; 81 + u64 addr; 82 + int err; 83 + 84 + ib_uctx = ib_uverbs_get_ucontext(attrs); 85 + if (IS_ERR(ib_uctx)) 86 + return PTR_ERR(ib_uctx); 87 + 88 + uctx = container_of(ib_uctx, struct bnxt_re_ucontext, ib_uctx); 89 + if (IS_ERR(uctx)) 90 + return PTR_ERR(uctx); 91 + 92 + err = uverbs_get_const(&alloc_type, attrs, BNXT_RE_ALLOC_PAGE_TYPE); 93 + if (err) 94 + return err; 95 + 96 + rdev = uctx->rdev; 97 + cctx = rdev->chip_ctx; 98 + 99 + switch (alloc_type) { 100 + case BNXT_RE_ALLOC_WC_PAGE: 101 + if (cctx->modes.db_push) { 102 + if (bnxt_qplib_alloc_dpi(&rdev->qplib_res, &uctx->wcdpi, 103 + uctx, BNXT_QPLIB_DPI_TYPE_WC)) 104 + return -ENOMEM; 105 + length = PAGE_SIZE; 106 + dpi = uctx->wcdpi.dpi; 107 + addr = (u64)uctx->wcdpi.umdbr; 108 + mmap_flag = BNXT_RE_MMAP_WC_DB; 109 + } else { 110 + return -EINVAL; 111 + } 112 + 113 + break; 114 + case BNXT_RE_ALLOC_DBR_BAR_PAGE: 115 + length = PAGE_SIZE; 116 + addr = (u64)rdev->pacing.dbr_bar_addr; 117 + mmap_flag = BNXT_RE_MMAP_DBR_BAR; 118 + break; 119 + 120 + case BNXT_RE_ALLOC_DBR_PAGE: 121 + length = PAGE_SIZE; 122 + addr = (u64)rdev->pacing.dbr_page; 123 + mmap_flag = BNXT_RE_MMAP_DBR_PAGE; 124 + break; 125 + 126 + default: 127 + return -EOPNOTSUPP; 128 + } 129 + 130 + entry = bnxt_re_mmap_entry_insert(uctx, addr, mmap_flag, &mmap_offset); 131 + if (!entry) 132 + return -ENOMEM; 133 + 134 + uobj->object = entry; 135 + uverbs_finalize_uobj_create(attrs, BNXT_RE_ALLOC_PAGE_HANDLE); 136 + err = uverbs_copy_to(attrs, BNXT_RE_ALLOC_PAGE_MMAP_OFFSET, 137 + &mmap_offset, sizeof(mmap_offset)); 138 + if (err) 139 + return err; 140 + 141 + err = uverbs_copy_to(attrs, BNXT_RE_ALLOC_PAGE_MMAP_LENGTH, 142 + &length, sizeof(length)); 143 + if (err) 144 + return err; 145 + 146 + err = uverbs_copy_to(attrs, BNXT_RE_ALLOC_PAGE_DPI, 147 + &dpi, sizeof(dpi)); 148 + if (err) 149 + return err; 150 + 151 + return 0; 152 + } 153 + 154 + static int alloc_page_obj_cleanup(struct ib_uobject *uobject, 155 + enum rdma_remove_reason why, 156 + struct uverbs_attr_bundle *attrs) 157 + { 158 + struct bnxt_re_user_mmap_entry *entry = uobject->object; 159 + struct bnxt_re_ucontext *uctx = entry->uctx; 160 + 161 + switch (entry->mmap_flag) { 162 + case BNXT_RE_MMAP_WC_DB: 163 + if (uctx && uctx->wcdpi.dbr) { 164 + struct bnxt_re_dev *rdev = uctx->rdev; 165 + 166 + bnxt_qplib_dealloc_dpi(&rdev->qplib_res, &uctx->wcdpi); 167 + uctx->wcdpi.dbr = NULL; 168 + } 169 + break; 170 + case BNXT_RE_MMAP_DBR_BAR: 171 + case BNXT_RE_MMAP_DBR_PAGE: 172 + break; 173 + default: 174 + goto exit; 175 + } 176 + rdma_user_mmap_entry_remove(&entry->rdma_entry); 177 + exit: 178 + return 0; 179 + } 180 + 181 + DECLARE_UVERBS_NAMED_METHOD(BNXT_RE_METHOD_ALLOC_PAGE, 182 + UVERBS_ATTR_IDR(BNXT_RE_ALLOC_PAGE_HANDLE, 183 + BNXT_RE_OBJECT_ALLOC_PAGE, 184 + UVERBS_ACCESS_NEW, 185 + UA_MANDATORY), 186 + UVERBS_ATTR_CONST_IN(BNXT_RE_ALLOC_PAGE_TYPE, 187 + enum bnxt_re_alloc_page_type, 188 + UA_MANDATORY), 189 + UVERBS_ATTR_PTR_OUT(BNXT_RE_ALLOC_PAGE_MMAP_OFFSET, 190 + UVERBS_ATTR_TYPE(u64), 191 + UA_MANDATORY), 192 + UVERBS_ATTR_PTR_OUT(BNXT_RE_ALLOC_PAGE_MMAP_LENGTH, 193 + UVERBS_ATTR_TYPE(u32), 194 + UA_MANDATORY), 195 + UVERBS_ATTR_PTR_OUT(BNXT_RE_ALLOC_PAGE_DPI, 196 + UVERBS_ATTR_TYPE(u32), 197 + UA_MANDATORY)); 198 + 199 + DECLARE_UVERBS_NAMED_METHOD_DESTROY(BNXT_RE_METHOD_DESTROY_PAGE, 200 + UVERBS_ATTR_IDR(BNXT_RE_DESTROY_PAGE_HANDLE, 201 + BNXT_RE_OBJECT_ALLOC_PAGE, 202 + UVERBS_ACCESS_DESTROY, 203 + UA_MANDATORY)); 204 + 205 + DECLARE_UVERBS_NAMED_OBJECT(BNXT_RE_OBJECT_ALLOC_PAGE, 206 + UVERBS_TYPE_ALLOC_IDR(alloc_page_obj_cleanup), 207 + &UVERBS_METHOD(BNXT_RE_METHOD_ALLOC_PAGE), 208 + &UVERBS_METHOD(BNXT_RE_METHOD_DESTROY_PAGE)); 209 + 210 + DECLARE_UVERBS_NAMED_METHOD(BNXT_RE_METHOD_NOTIFY_DRV); 211 + 212 + DECLARE_UVERBS_GLOBAL_METHODS(BNXT_RE_OBJECT_NOTIFY_DRV, 213 + &UVERBS_METHOD(BNXT_RE_METHOD_NOTIFY_DRV)); 214 + 215 + /* Toggle MEM */ 216 + static int UVERBS_HANDLER(BNXT_RE_METHOD_GET_TOGGLE_MEM)(struct uverbs_attr_bundle *attrs) 217 + { 218 + struct ib_uobject *uobj = uverbs_attr_get_uobject(attrs, BNXT_RE_TOGGLE_MEM_HANDLE); 219 + enum bnxt_re_mmap_flag mmap_flag = BNXT_RE_MMAP_TOGGLE_PAGE; 220 + enum bnxt_re_get_toggle_mem_type res_type; 221 + struct bnxt_re_user_mmap_entry *entry; 222 + struct bnxt_re_ucontext *uctx; 223 + struct ib_ucontext *ib_uctx; 224 + struct bnxt_re_dev *rdev; 225 + struct bnxt_re_srq *srq; 226 + u32 length = PAGE_SIZE; 227 + struct bnxt_re_cq *cq; 228 + u64 mem_offset; 229 + u32 offset = 0; 230 + u64 addr = 0; 231 + u32 res_id; 232 + int err; 233 + 234 + ib_uctx = ib_uverbs_get_ucontext(attrs); 235 + if (IS_ERR(ib_uctx)) 236 + return PTR_ERR(ib_uctx); 237 + 238 + err = uverbs_get_const(&res_type, attrs, BNXT_RE_TOGGLE_MEM_TYPE); 239 + if (err) 240 + return err; 241 + 242 + uctx = container_of(ib_uctx, struct bnxt_re_ucontext, ib_uctx); 243 + rdev = uctx->rdev; 244 + err = uverbs_copy_from(&res_id, attrs, BNXT_RE_TOGGLE_MEM_RES_ID); 245 + if (err) 246 + return err; 247 + 248 + switch (res_type) { 249 + case BNXT_RE_CQ_TOGGLE_MEM: 250 + cq = bnxt_re_search_for_cq(rdev, res_id); 251 + if (!cq) 252 + return -EINVAL; 253 + 254 + addr = (u64)cq->uctx_cq_page; 255 + break; 256 + case BNXT_RE_SRQ_TOGGLE_MEM: 257 + srq = bnxt_re_search_for_srq(rdev, res_id); 258 + if (!srq) 259 + return -EINVAL; 260 + 261 + addr = (u64)srq->uctx_srq_page; 262 + break; 263 + 264 + default: 265 + return -EOPNOTSUPP; 266 + } 267 + 268 + entry = bnxt_re_mmap_entry_insert(uctx, addr, mmap_flag, &mem_offset); 269 + if (!entry) 270 + return -ENOMEM; 271 + 272 + uobj->object = entry; 273 + uverbs_finalize_uobj_create(attrs, BNXT_RE_TOGGLE_MEM_HANDLE); 274 + err = uverbs_copy_to(attrs, BNXT_RE_TOGGLE_MEM_MMAP_PAGE, 275 + &mem_offset, sizeof(mem_offset)); 276 + if (err) 277 + return err; 278 + 279 + err = uverbs_copy_to(attrs, BNXT_RE_TOGGLE_MEM_MMAP_LENGTH, 280 + &length, sizeof(length)); 281 + if (err) 282 + return err; 283 + 284 + err = uverbs_copy_to(attrs, BNXT_RE_TOGGLE_MEM_MMAP_OFFSET, 285 + &offset, sizeof(offset)); 286 + if (err) 287 + return err; 288 + 289 + return 0; 290 + } 291 + 292 + static int get_toggle_mem_obj_cleanup(struct ib_uobject *uobject, 293 + enum rdma_remove_reason why, 294 + struct uverbs_attr_bundle *attrs) 295 + { 296 + struct bnxt_re_user_mmap_entry *entry = uobject->object; 297 + 298 + rdma_user_mmap_entry_remove(&entry->rdma_entry); 299 + return 0; 300 + } 301 + 302 + DECLARE_UVERBS_NAMED_METHOD(BNXT_RE_METHOD_GET_TOGGLE_MEM, 303 + UVERBS_ATTR_IDR(BNXT_RE_TOGGLE_MEM_HANDLE, 304 + BNXT_RE_OBJECT_GET_TOGGLE_MEM, 305 + UVERBS_ACCESS_NEW, 306 + UA_MANDATORY), 307 + UVERBS_ATTR_CONST_IN(BNXT_RE_TOGGLE_MEM_TYPE, 308 + enum bnxt_re_get_toggle_mem_type, 309 + UA_MANDATORY), 310 + UVERBS_ATTR_PTR_IN(BNXT_RE_TOGGLE_MEM_RES_ID, 311 + UVERBS_ATTR_TYPE(u32), 312 + UA_MANDATORY), 313 + UVERBS_ATTR_PTR_OUT(BNXT_RE_TOGGLE_MEM_MMAP_PAGE, 314 + UVERBS_ATTR_TYPE(u64), 315 + UA_MANDATORY), 316 + UVERBS_ATTR_PTR_OUT(BNXT_RE_TOGGLE_MEM_MMAP_OFFSET, 317 + UVERBS_ATTR_TYPE(u32), 318 + UA_MANDATORY), 319 + UVERBS_ATTR_PTR_OUT(BNXT_RE_TOGGLE_MEM_MMAP_LENGTH, 320 + UVERBS_ATTR_TYPE(u32), 321 + UA_MANDATORY)); 322 + 323 + DECLARE_UVERBS_NAMED_METHOD_DESTROY(BNXT_RE_METHOD_RELEASE_TOGGLE_MEM, 324 + UVERBS_ATTR_IDR(BNXT_RE_RELEASE_TOGGLE_MEM_HANDLE, 325 + BNXT_RE_OBJECT_GET_TOGGLE_MEM, 326 + UVERBS_ACCESS_DESTROY, 327 + UA_MANDATORY)); 328 + 329 + DECLARE_UVERBS_NAMED_OBJECT(BNXT_RE_OBJECT_GET_TOGGLE_MEM, 330 + UVERBS_TYPE_ALLOC_IDR(get_toggle_mem_obj_cleanup), 331 + &UVERBS_METHOD(BNXT_RE_METHOD_GET_TOGGLE_MEM), 332 + &UVERBS_METHOD(BNXT_RE_METHOD_RELEASE_TOGGLE_MEM)); 333 + 334 + static int UVERBS_HANDLER(BNXT_RE_METHOD_DBR_ALLOC)(struct uverbs_attr_bundle *attrs) 335 + { 336 + struct bnxt_re_db_region dbr = {}; 337 + struct bnxt_re_ucontext *uctx; 338 + struct bnxt_re_dbr_obj *obj; 339 + struct ib_ucontext *ib_uctx; 340 + struct bnxt_qplib_dpi *dpi; 341 + struct bnxt_re_dev *rdev; 342 + struct ib_uobject *uobj; 343 + u64 mmap_offset; 344 + int ret; 345 + 346 + ib_uctx = ib_uverbs_get_ucontext(attrs); 347 + if (IS_ERR(ib_uctx)) 348 + return PTR_ERR(ib_uctx); 349 + 350 + uctx = container_of(ib_uctx, struct bnxt_re_ucontext, ib_uctx); 351 + rdev = uctx->rdev; 352 + uobj = uverbs_attr_get_uobject(attrs, BNXT_RE_ALLOC_DBR_HANDLE); 353 + 354 + obj = kzalloc_obj(*obj); 355 + if (!obj) 356 + return -ENOMEM; 357 + 358 + dpi = &obj->dpi; 359 + ret = bnxt_qplib_alloc_uc_dpi(&rdev->qplib_res, dpi); 360 + if (ret) 361 + goto free_mem; 362 + 363 + obj->entry = bnxt_re_mmap_entry_insert(uctx, dpi->umdbr, 364 + BNXT_RE_MMAP_UC_DB, 365 + &mmap_offset); 366 + if (!obj->entry) { 367 + ret = -ENOMEM; 368 + goto free_dpi; 369 + } 370 + 371 + obj->rdev = rdev; 372 + uobj->object = obj; 373 + uverbs_finalize_uobj_create(attrs, BNXT_RE_ALLOC_DBR_HANDLE); 374 + 375 + dbr.umdbr = dpi->umdbr; 376 + dbr.dpi = dpi->dpi; 377 + ret = uverbs_copy_to_struct_or_zero(attrs, BNXT_RE_ALLOC_DBR_ATTR, 378 + &dbr, sizeof(dbr)); 379 + if (ret) 380 + return ret; 381 + 382 + ret = uverbs_copy_to(attrs, BNXT_RE_ALLOC_DBR_OFFSET, 383 + &mmap_offset, sizeof(mmap_offset)); 384 + if (ret) 385 + return ret; 386 + return 0; 387 + free_dpi: 388 + bnxt_qplib_free_uc_dpi(&rdev->qplib_res, dpi); 389 + free_mem: 390 + kfree(obj); 391 + return ret; 392 + } 393 + 394 + static int bnxt_re_dbr_cleanup(struct ib_uobject *uobject, 395 + enum rdma_remove_reason why, 396 + struct uverbs_attr_bundle *attrs) 397 + { 398 + struct bnxt_re_dbr_obj *obj = uobject->object; 399 + struct bnxt_re_dev *rdev = obj->rdev; 400 + 401 + rdma_user_mmap_entry_remove(&obj->entry->rdma_entry); 402 + bnxt_qplib_free_uc_dpi(&rdev->qplib_res, &obj->dpi); 403 + return 0; 404 + } 405 + 406 + static int UVERBS_HANDLER(BNXT_RE_METHOD_GET_DEFAULT_DBR)(struct uverbs_attr_bundle *attrs) 407 + { 408 + struct bnxt_re_db_region dpi = {}; 409 + struct bnxt_re_ucontext *uctx; 410 + struct ib_ucontext *ib_uctx; 411 + int ret; 412 + 413 + ib_uctx = ib_uverbs_get_ucontext(attrs); 414 + if (IS_ERR(ib_uctx)) 415 + return PTR_ERR(ib_uctx); 416 + 417 + uctx = container_of(ib_uctx, struct bnxt_re_ucontext, ib_uctx); 418 + dpi.umdbr = uctx->dpi.umdbr; 419 + dpi.dpi = uctx->dpi.dpi; 420 + 421 + ret = uverbs_copy_to_struct_or_zero(attrs, BNXT_RE_DEFAULT_DBR_ATTR, 422 + &dpi, sizeof(dpi)); 423 + if (ret) 424 + return ret; 425 + 426 + return 0; 427 + } 428 + 429 + DECLARE_UVERBS_NAMED_METHOD(BNXT_RE_METHOD_DBR_ALLOC, 430 + UVERBS_ATTR_IDR(BNXT_RE_ALLOC_DBR_HANDLE, 431 + BNXT_RE_OBJECT_DBR, 432 + UVERBS_ACCESS_NEW, 433 + UA_MANDATORY), 434 + UVERBS_ATTR_PTR_OUT(BNXT_RE_ALLOC_DBR_ATTR, 435 + UVERBS_ATTR_STRUCT(struct bnxt_re_db_region, 436 + umdbr), 437 + UA_MANDATORY), 438 + UVERBS_ATTR_PTR_OUT(BNXT_RE_ALLOC_DBR_OFFSET, 439 + UVERBS_ATTR_TYPE(u64), 440 + UA_MANDATORY)); 441 + 442 + DECLARE_UVERBS_NAMED_METHOD_DESTROY(BNXT_RE_METHOD_DBR_FREE, 443 + UVERBS_ATTR_IDR(BNXT_RE_FREE_DBR_HANDLE, 444 + BNXT_RE_OBJECT_DBR, 445 + UVERBS_ACCESS_DESTROY, 446 + UA_MANDATORY)); 447 + 448 + DECLARE_UVERBS_NAMED_OBJECT(BNXT_RE_OBJECT_DBR, 449 + UVERBS_TYPE_ALLOC_IDR(bnxt_re_dbr_cleanup), 450 + &UVERBS_METHOD(BNXT_RE_METHOD_DBR_ALLOC), 451 + &UVERBS_METHOD(BNXT_RE_METHOD_DBR_FREE)); 452 + 453 + DECLARE_UVERBS_NAMED_METHOD(BNXT_RE_METHOD_GET_DEFAULT_DBR, 454 + UVERBS_ATTR_PTR_OUT(BNXT_RE_DEFAULT_DBR_ATTR, 455 + UVERBS_ATTR_STRUCT(struct bnxt_re_db_region, 456 + umdbr), 457 + UA_MANDATORY)); 458 + 459 + DECLARE_UVERBS_GLOBAL_METHODS(BNXT_RE_OBJECT_DEFAULT_DBR, 460 + &UVERBS_METHOD(BNXT_RE_METHOD_GET_DEFAULT_DBR)); 461 + 462 + const struct uapi_definition bnxt_re_uapi_defs[] = { 463 + UAPI_DEF_CHAIN_OBJ_TREE_NAMED(BNXT_RE_OBJECT_ALLOC_PAGE), 464 + UAPI_DEF_CHAIN_OBJ_TREE_NAMED(BNXT_RE_OBJECT_NOTIFY_DRV), 465 + UAPI_DEF_CHAIN_OBJ_TREE_NAMED(BNXT_RE_OBJECT_GET_TOGGLE_MEM), 466 + UAPI_DEF_CHAIN_OBJ_TREE_NAMED(BNXT_RE_OBJECT_DBR), 467 + UAPI_DEF_CHAIN_OBJ_TREE_NAMED(BNXT_RE_OBJECT_DEFAULT_DBR), 468 + {} 469 + };

+1 -1

drivers/infiniband/hw/cxgb4/mem.c

··· 32 32 33 33 #include <linux/module.h> 34 34 #include <linux/moduleparam.h> 35 - #include <rdma/ib_umem.h> 36 35 #include <linux/atomic.h> 37 36 #include <rdma/ib_user_verbs.h> 37 + #include <rdma/iter.h> 38 38 39 39 #include "iw_cxgb4.h" 40 40

+2 -4

drivers/infiniband/hw/efa/efa.h

··· 161 161 int efa_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init_attr, 162 162 struct ib_udata *udata); 163 163 int efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata); 164 - int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 165 - struct uverbs_attr_bundle *attrs); 166 - int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 167 - struct ib_umem *umem, struct uverbs_attr_bundle *attrs); 164 + int efa_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 165 + struct uverbs_attr_bundle *attrs); 168 166 struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length, 169 167 u64 virt_addr, int access_flags, 170 168 struct ib_dmah *dmah,

+18 -5

drivers/infiniband/hw/efa/efa_admin_cmds_defs.h

··· 1 1 /* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */ 2 2 /* 3 - * Copyright 2018-2025 Amazon.com, Inc. or its affiliates. All rights reserved. 3 + * Copyright 2018-2026 Amazon.com, Inc. or its affiliates. All rights reserved. 4 4 */ 5 5 6 6 #ifndef _EFA_ADMIN_CMDS_H_ ··· 38 38 EFA_ADMIN_DEVICE_ATTR = 1, 39 39 EFA_ADMIN_AENQ_CONFIG = 2, 40 40 EFA_ADMIN_NETWORK_ATTR = 3, 41 - EFA_ADMIN_QUEUE_ATTR = 4, 41 + EFA_ADMIN_QUEUE_ATTR_1 = 4, 42 42 EFA_ADMIN_HW_HINTS = 5, 43 43 EFA_ADMIN_HOST_INFO = 6, 44 44 EFA_ADMIN_EVENT_QUEUE_ATTR = 7, 45 + EFA_ADMIN_QUEUE_ATTR_2 = 9, 45 46 }; 46 47 47 48 /* QP transport type */ ··· 745 744 u32 reserved1; 746 745 }; 747 746 748 - struct efa_admin_feature_queue_attr_desc { 747 + struct efa_admin_feature_queue_attr_desc_1 { 749 748 /* The maximum number of queue pairs supported */ 750 749 u32 max_qp; 751 750 752 751 /* Maximum number of WQEs per Send Queue */ 753 752 u32 max_sq_depth; 754 753 755 - /* Maximum size of data that can be sent inline in a Send WQE */ 754 + /* 755 + * Maximum size of data that can be sent inline in a Send WQE 756 + * (deprecated by 757 + * efa_admin_feature_queue_attr_desc_2::inline_buf_size_ex on 758 + * supporting devices) 759 + */ 756 760 u32 inline_buf_size; 757 761 758 762 /* Maximum number of buffer descriptors per Recv Queue */ ··· 809 803 * two consecutive doorbells. Zero means unlimited. 810 804 */ 811 805 u16 max_tx_batch; 806 + }; 807 + 808 + struct efa_admin_feature_queue_attr_desc_2 { 809 + /* Maximum size of data that can be sent inline in a Send WQE */ 810 + u16 inline_buf_size_ex; 812 811 }; 813 812 814 813 struct efa_admin_event_queue_attr_desc { ··· 883 872 884 873 struct efa_admin_feature_network_attr_desc network_attr; 885 874 886 - struct efa_admin_feature_queue_attr_desc queue_attr; 875 + struct efa_admin_feature_queue_attr_desc_1 queue_attr_1; 876 + 877 + struct efa_admin_feature_queue_attr_desc_2 queue_attr_2; 887 878 888 879 struct efa_admin_event_queue_attr_desc event_queue_attr; 889 880

+35 -20

drivers/infiniband/hw/efa/efa_com_cmd.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause 2 2 /* 3 - * Copyright 2018-2025 Amazon.com, Inc. or its affiliates. All rights reserved. 3 + * Copyright 2018-2026 Amazon.com, Inc. or its affiliates. All rights reserved. 4 4 */ 5 5 6 6 #include "efa_com.h" ··· 479 479 480 480 edev->supported_features = resp.u.device_attr.supported_features; 481 481 err = efa_com_get_feature(edev, &resp, 482 - EFA_ADMIN_QUEUE_ATTR); 482 + EFA_ADMIN_QUEUE_ATTR_1); 483 483 if (err) { 484 484 ibdev_err_ratelimited(edev->efa_dev, 485 - "Failed to get queue attributes %d\n", 485 + "Failed to get queue attributes1 %d\n", 486 486 err); 487 487 return err; 488 488 } 489 489 490 - result->max_qp = resp.u.queue_attr.max_qp; 491 - result->max_sq_depth = resp.u.queue_attr.max_sq_depth; 492 - result->max_rq_depth = resp.u.queue_attr.max_rq_depth; 493 - result->max_cq = resp.u.queue_attr.max_cq; 494 - result->max_cq_depth = resp.u.queue_attr.max_cq_depth; 495 - result->inline_buf_size = resp.u.queue_attr.inline_buf_size; 496 - result->max_sq_sge = resp.u.queue_attr.max_wr_send_sges; 497 - result->max_rq_sge = resp.u.queue_attr.max_wr_recv_sges; 498 - result->max_mr = resp.u.queue_attr.max_mr; 499 - result->max_mr_pages = resp.u.queue_attr.max_mr_pages; 500 - result->max_pd = resp.u.queue_attr.max_pd; 501 - result->max_ah = resp.u.queue_attr.max_ah; 502 - result->max_llq_size = resp.u.queue_attr.max_llq_size; 503 - result->sub_cqs_per_cq = resp.u.queue_attr.sub_cqs_per_cq; 504 - result->max_wr_rdma_sge = resp.u.queue_attr.max_wr_rdma_sges; 505 - result->max_tx_batch = resp.u.queue_attr.max_tx_batch; 506 - result->min_sq_depth = resp.u.queue_attr.min_sq_depth; 490 + result->max_qp = resp.u.queue_attr_1.max_qp; 491 + result->max_sq_depth = resp.u.queue_attr_1.max_sq_depth; 492 + result->max_rq_depth = resp.u.queue_attr_1.max_rq_depth; 493 + result->max_cq = resp.u.queue_attr_1.max_cq; 494 + result->max_cq_depth = resp.u.queue_attr_1.max_cq_depth; 495 + result->inline_buf_size = resp.u.queue_attr_1.inline_buf_size; 496 + result->max_sq_sge = resp.u.queue_attr_1.max_wr_send_sges; 497 + result->max_rq_sge = resp.u.queue_attr_1.max_wr_recv_sges; 498 + result->max_mr = resp.u.queue_attr_1.max_mr; 499 + result->max_mr_pages = resp.u.queue_attr_1.max_mr_pages; 500 + result->max_pd = resp.u.queue_attr_1.max_pd; 501 + result->max_ah = resp.u.queue_attr_1.max_ah; 502 + result->max_llq_size = resp.u.queue_attr_1.max_llq_size; 503 + result->sub_cqs_per_cq = resp.u.queue_attr_1.sub_cqs_per_cq; 504 + result->max_wr_rdma_sge = resp.u.queue_attr_1.max_wr_rdma_sges; 505 + result->max_tx_batch = resp.u.queue_attr_1.max_tx_batch; 506 + result->min_sq_depth = resp.u.queue_attr_1.min_sq_depth; 507 + 508 + if (efa_com_check_supported_feature_id(edev, EFA_ADMIN_QUEUE_ATTR_2)) { 509 + err = efa_com_get_feature(edev, &resp, 510 + EFA_ADMIN_QUEUE_ATTR_2); 511 + if (err) { 512 + ibdev_err_ratelimited( 513 + edev->efa_dev, 514 + "Failed to get queue attributes2 %d\n", err); 515 + return err; 516 + } 517 + 518 + result->inline_buf_size_ex = resp.u.queue_attr_2.inline_buf_size_ex; 519 + } else { 520 + result->inline_buf_size_ex = result->inline_buf_size; 521 + } 507 522 508 523 err = efa_com_get_feature(edev, &resp, EFA_ADMIN_NETWORK_ATTR); 509 524 if (err) {

+2 -1

drivers/infiniband/hw/efa/efa_com_cmd.h

··· 1 1 /* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */ 2 2 /* 3 - * Copyright 2018-2025 Amazon.com, Inc. or its affiliates. All rights reserved. 3 + * Copyright 2018-2026 Amazon.com, Inc. or its affiliates. All rights reserved. 4 4 */ 5 5 6 6 #ifndef _EFA_COM_CMD_H_ ··· 127 127 u32 max_cq; 128 128 u32 max_cq_depth; /* cqes */ 129 129 u32 inline_buf_size; 130 + u32 inline_buf_size_ex; 130 131 u32 max_mr; 131 132 u32 max_pd; 132 133 u32 max_ah;

+1 -2

drivers/infiniband/hw/efa/efa_main.c

··· 371 371 .alloc_hw_device_stats = efa_alloc_hw_device_stats, 372 372 .alloc_pd = efa_alloc_pd, 373 373 .alloc_ucontext = efa_alloc_ucontext, 374 - .create_cq = efa_create_cq, 375 - .create_cq_umem = efa_create_cq_umem, 374 + .create_user_cq = efa_create_user_cq, 376 375 .create_qp = efa_create_qp, 377 376 .create_user_ah = efa_create_ah, 378 377 .dealloc_pd = efa_dealloc_pd,

+39 -84

drivers/infiniband/hw/efa/efa_verbs.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 2 /* 3 - * Copyright 2018-2024 Amazon.com, Inc. or its affiliates. All rights reserved. 3 + * Copyright 2018-2026 Amazon.com, Inc. or its affiliates. All rights reserved. 4 4 */ 5 5 6 6 #include <linux/dma-buf.h> ··· 9 9 #include <linux/log2.h> 10 10 11 11 #include <rdma/ib_addr.h> 12 - #include <rdma/ib_umem.h> 13 12 #include <rdma/ib_user_verbs.h> 14 13 #include <rdma/ib_verbs.h> 14 + #include <rdma/iter.h> 15 15 #include <rdma/uverbs_ioctl.h> 16 16 #define UVERBS_MODULE_NAME efa_ib 17 17 #include <rdma/uverbs_named_ioctl.h> ··· 579 579 580 580 resp->llq_desc_offset &= ~PAGE_MASK; 581 581 582 - if (qp->rq_size) { 582 + if (qp->rq_cpu_addr) { 583 583 address = dev->db_bar_addr + resp->rq_db_offset; 584 584 585 585 qp->rq_db_mmap_entry = ··· 641 641 init_attr->cap.max_recv_sge, dev->dev_attr.max_rq_sge); 642 642 return -EINVAL; 643 643 } 644 - if (init_attr->cap.max_inline_data > dev->dev_attr.inline_buf_size) { 644 + if (init_attr->cap.max_inline_data > dev->dev_attr.inline_buf_size_ex) { 645 645 ibdev_dbg(&dev->ibdev, 646 646 "qp: requested inline data[%u] exceeds the max[%u]\n", 647 647 init_attr->cap.max_inline_data, 648 - dev->dev_attr.inline_buf_size); 648 + dev->dev_attr.inline_buf_size_ex); 649 649 return -EINVAL; 650 650 } 651 651 ··· 682 682 struct efa_com_create_qp_result create_qp_resp; 683 683 struct efa_dev *dev = to_edev(ibqp->device); 684 684 struct efa_ibv_create_qp_resp resp = {}; 685 - struct efa_ibv_create_qp cmd = {}; 685 + struct efa_ibv_create_qp cmd; 686 686 struct efa_qp *qp = to_eqp(ibqp); 687 687 struct efa_ucontext *ucontext; 688 688 u16 supported_efa_flags = 0; ··· 699 699 if (err) 700 700 goto err_out; 701 701 702 - if (offsetofend(typeof(cmd), driver_qp_type) > udata->inlen) { 703 - ibdev_dbg(&dev->ibdev, 704 - "Incompatible ABI params, no input udata\n"); 705 - err = -EINVAL; 702 + err = ib_copy_validate_udata_in_cm(udata, cmd, driver_qp_type, 0); 703 + if (err) 706 704 goto err_out; 707 - } 708 705 709 - if (udata->inlen > sizeof(cmd) && 710 - !ib_is_udata_cleared(udata, sizeof(cmd), 711 - udata->inlen - sizeof(cmd))) { 712 - ibdev_dbg(&dev->ibdev, 713 - "Incompatible ABI params, unknown fields in udata\n"); 714 - err = -EINVAL; 715 - goto err_out; 716 - } 717 - 718 - err = ib_copy_from_udata(&cmd, udata, 719 - min(sizeof(cmd), udata->inlen)); 720 - if (err) { 721 - ibdev_dbg(&dev->ibdev, 722 - "Cannot copy udata for create_qp\n"); 723 - goto err_out; 724 - } 725 - 726 - if (cmd.comp_mask || !is_reserved_cleared(cmd.reserved_98)) { 706 + if (!is_reserved_cleared(cmd.reserved_98)) { 727 707 ibdev_dbg(&dev->ibdev, 728 708 "Incompatible ABI params, unknown fields in udata\n"); 729 709 err = -EINVAL; ··· 808 828 err_destroy_qp: 809 829 efa_destroy_qp_handle(dev, create_qp_resp.qp_handle); 810 830 err_free_mapped: 811 - if (qp->rq_size) 831 + if (qp->rq_cpu_addr) 812 832 efa_free_mapped(dev, qp->rq_cpu_addr, qp->rq_dma_addr, 813 833 qp->rq_size, DMA_TO_DEVICE); 814 834 err_out: ··· 1063 1083 cq->cq_idx, cq->cpu_addr, cq->size, &cq->dma_addr); 1064 1084 1065 1085 efa_destroy_cq_idx(dev, cq->cq_idx); 1066 - efa_cq_user_mmap_entries_remove(cq); 1086 + if (cq->cpu_addr) 1087 + efa_cq_user_mmap_entries_remove(cq); 1067 1088 if (cq->eq) { 1068 1089 xa_erase(&dev->cqs_xa, cq->cq_idx); 1069 1090 synchronize_irq(cq->eq->irq.irqn); 1070 1091 } 1071 1092 1072 - if (cq->umem) 1073 - ib_umem_release(cq->umem); 1074 - else 1093 + if (cq->cpu_addr) 1075 1094 efa_free_mapped(dev, cq->cpu_addr, cq->dma_addr, cq->size, DMA_FROM_DEVICE); 1076 1095 return 0; 1077 1096 } ··· 1110 1131 return 0; 1111 1132 } 1112 1133 1113 - int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 1114 - struct ib_umem *umem, struct uverbs_attr_bundle *attrs) 1134 + int efa_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 1135 + struct uverbs_attr_bundle *attrs) 1115 1136 { 1116 1137 struct ib_udata *udata = &attrs->driver_udata; 1117 1138 struct efa_ucontext *ucontext = rdma_udata_to_drv_context( ··· 1121 1142 struct efa_com_create_cq_result result; 1122 1143 struct ib_device *ibdev = ibcq->device; 1123 1144 struct efa_dev *dev = to_edev(ibdev); 1124 - struct efa_ibv_create_cq cmd = {}; 1145 + struct efa_ibv_create_cq cmd; 1125 1146 struct efa_cq *cq = to_ecq(ibcq); 1126 1147 int entries = attr->cqe; 1127 1148 bool set_src_addr; ··· 1132 1153 if (attr->flags) 1133 1154 return -EOPNOTSUPP; 1134 1155 1135 - if (entries < 1 || entries > dev->dev_attr.max_cq_depth) { 1156 + if (entries > dev->dev_attr.max_cq_depth) { 1136 1157 ibdev_dbg(ibdev, 1137 - "cq: requested entries[%u] non-positive or greater than max[%u]\n", 1158 + "cq: requested entries[%u] greater than max[%u]\n", 1138 1159 entries, dev->dev_attr.max_cq_depth); 1139 1160 err = -EINVAL; 1140 1161 goto err_out; 1141 1162 } 1142 1163 1143 - if (offsetofend(typeof(cmd), num_sub_cqs) > udata->inlen) { 1144 - ibdev_dbg(ibdev, 1145 - "Incompatible ABI params, no input udata\n"); 1146 - err = -EINVAL; 1164 + err = ib_copy_validate_udata_in_cm(udata, cmd, num_sub_cqs, 0); 1165 + if (err) 1147 1166 goto err_out; 1148 - } 1149 1167 1150 - if (udata->inlen > sizeof(cmd) && 1151 - !ib_is_udata_cleared(udata, sizeof(cmd), 1152 - udata->inlen - sizeof(cmd))) { 1153 - ibdev_dbg(ibdev, 1154 - "Incompatible ABI params, unknown fields in udata\n"); 1155 - err = -EINVAL; 1156 - goto err_out; 1157 - } 1158 - 1159 - err = ib_copy_from_udata(&cmd, udata, 1160 - min(sizeof(cmd), udata->inlen)); 1161 - if (err) { 1162 - ibdev_dbg(ibdev, "Cannot copy udata for create_cq\n"); 1163 - goto err_out; 1164 - } 1165 - 1166 - if (cmd.comp_mask || !is_reserved_cleared(cmd.reserved_58)) { 1168 + if (!is_reserved_cleared(cmd.reserved_58)) { 1167 1169 ibdev_dbg(ibdev, 1168 1170 "Incompatible ABI params, unknown fields in udata\n"); 1169 1171 err = -EINVAL; ··· 1172 1212 cq->ucontext = ucontext; 1173 1213 cq->size = PAGE_ALIGN(cmd.cq_entry_size * entries * cmd.num_sub_cqs); 1174 1214 1175 - if (umem) { 1176 - if (umem->length < cq->size) { 1215 + if (ibcq->umem) { 1216 + if (ibcq->umem->length < cq->size) { 1177 1217 ibdev_dbg(&dev->ibdev, "External memory too small\n"); 1178 1218 err = -EINVAL; 1179 1219 goto err_out; 1180 1220 } 1181 1221 1182 - if (!ib_umem_is_contiguous(umem)) { 1222 + if (!ib_umem_is_contiguous(ibcq->umem)) { 1183 1223 ibdev_dbg(&dev->ibdev, "Non contiguous CQ unsupported\n"); 1184 1224 err = -EINVAL; 1185 1225 goto err_out; 1186 1226 } 1187 1227 1188 - cq->cpu_addr = NULL; 1189 - cq->dma_addr = ib_umem_start_dma_addr(umem); 1190 - cq->umem = umem; 1228 + cq->dma_addr = ib_umem_start_dma_addr(ibcq->umem); 1191 1229 } else { 1192 1230 cq->cpu_addr = efa_zalloc_mapped(dev, &cq->dma_addr, cq->size, 1193 1231 DMA_FROM_DEVICE); ··· 1217 1259 cq->ibcq.cqe = result.actual_depth; 1218 1260 WARN_ON_ONCE(entries != result.actual_depth); 1219 1261 1220 - if (!umem) 1262 + if (cq->cpu_addr) 1221 1263 err = cq_mmap_entries_setup(dev, cq, &resp, result.db_valid); 1222 1264 1223 1265 if (err) { ··· 1254 1296 if (cq->eq) 1255 1297 xa_erase(&dev->cqs_xa, cq->cq_idx); 1256 1298 err_remove_mmap: 1257 - efa_cq_user_mmap_entries_remove(cq); 1299 + if (cq->cpu_addr) 1300 + efa_cq_user_mmap_entries_remove(cq); 1258 1301 err_destroy_cq: 1259 1302 efa_destroy_cq_idx(dev, cq->cq_idx); 1260 1303 err_free_mapped: 1261 - if (!umem) 1304 + if (cq->cpu_addr) 1262 1305 efa_free_mapped(dev, cq->cpu_addr, cq->dma_addr, cq->size, 1263 1306 DMA_FROM_DEVICE); 1264 1307 err_out: 1265 1308 atomic64_inc(&dev->stats.create_cq_err); 1266 1309 return err; 1267 - } 1268 - 1269 - int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 1270 - struct uverbs_attr_bundle *attrs) 1271 - { 1272 - return efa_create_cq_umem(ibcq, attr, NULL, attrs); 1273 1310 } 1274 1311 1275 1312 static int umem_to_page_list(struct efa_dev *dev, ··· 1878 1925 return efa_com_dealloc_uar(&dev->edev, &params); 1879 1926 } 1880 1927 1881 - #define EFA_CHECK_USER_COMP(_dev, _comp_mask, _attr, _mask, _attr_str) \ 1882 - (_attr_str = (!(_dev)->dev_attr._attr || ((_comp_mask) & (_mask))) ? \ 1928 + #define EFA_CHECK_USER_SUPP(_dev, _supported_caps, _attr, _mask, _attr_str) \ 1929 + (_attr_str = (!(_dev)->dev_attr._attr || ((_supported_caps) & (_mask))) ? \ 1883 1930 NULL : #_attr) 1884 1931 1885 - static int efa_user_comp_handshake(const struct ib_ucontext *ibucontext, 1932 + static int efa_user_supp_handshake(const struct ib_ucontext *ibucontext, 1886 1933 const struct efa_ibv_alloc_ucontext_cmd *cmd) 1887 1934 { 1888 1935 struct efa_dev *dev = to_edev(ibucontext->device); 1889 1936 char *attr_str; 1890 1937 1891 - if (EFA_CHECK_USER_COMP(dev, cmd->comp_mask, max_tx_batch, 1892 - EFA_ALLOC_UCONTEXT_CMD_COMP_TX_BATCH, attr_str)) 1938 + if (EFA_CHECK_USER_SUPP(dev, cmd->supported_caps, max_tx_batch, 1939 + EFA_ALLOC_UCONTEXT_CMD_SUPP_CAPS_TX_BATCH, 1940 + attr_str)) 1893 1941 goto err; 1894 1942 1895 - if (EFA_CHECK_USER_COMP(dev, cmd->comp_mask, min_sq_depth, 1896 - EFA_ALLOC_UCONTEXT_CMD_COMP_MIN_SQ_WR, 1943 + if (EFA_CHECK_USER_SUPP(dev, cmd->supported_caps, min_sq_depth, 1944 + EFA_ALLOC_UCONTEXT_CMD_SUPP_CAPS_MIN_SQ_WR, 1897 1945 attr_str)) 1898 1946 goto err; 1899 1947 ··· 1928 1974 goto err_out; 1929 1975 } 1930 1976 1931 - err = efa_user_comp_handshake(ibucontext, &cmd); 1977 + err = efa_user_supp_handshake(ibucontext, &cmd); 1932 1978 if (err) 1933 1979 goto err_out; 1934 1980 ··· 1942 1988 resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_CREATE_AH; 1943 1989 resp.sub_cqs_per_cq = dev->dev_attr.sub_cqs_per_cq; 1944 1990 resp.inline_buf_size = dev->dev_attr.inline_buf_size; 1991 + resp.inline_buf_size_ex = dev->dev_attr.inline_buf_size_ex; 1945 1992 resp.max_llq_size = dev->dev_attr.max_llq_size; 1946 1993 resp.max_tx_batch = dev->dev_attr.max_tx_batch; 1947 1994 resp.min_sq_wr = dev->dev_attr.min_sq_depth;

-1

drivers/infiniband/hw/erdma/erdma.h

··· 127 127 unsigned char peer_addr[ETH_ALEN]; 128 128 unsigned long cap_flags; 129 129 130 - int numa_node; 131 130 enum erdma_cc_alg cc; 132 131 u32 irq_num; 133 132

+2 -1

drivers/infiniband/hw/erdma/erdma_eq.c

··· 197 197 tasklet_init(&dev->ceqs[ceqn].tasklet, erdma_intr_ceq_task, 198 198 (unsigned long)&dev->ceqs[ceqn]); 199 199 200 - cpumask_set_cpu(cpumask_local_spread(ceqn + 1, dev->attrs.numa_node), 200 + cpumask_set_cpu(cpumask_local_spread(ceqn + 1, 201 + dev_to_node(&dev->pdev->dev)), 201 202 &eqc->irq.affinity_hint_mask); 202 203 203 204 err = request_irq(eqc->irq.msix_vector, erdma_intr_ceq_handler, 0,

-1

drivers/infiniband/hw/erdma/erdma_main.c

··· 261 261 262 262 pci_set_drvdata(pdev, dev); 263 263 dev->pdev = pdev; 264 - dev->attrs.numa_node = dev_to_node(&pdev->dev); 265 264 266 265 bars = pci_select_bars(pdev, IORESOURCE_MEM); 267 266 err = pci_request_selected_regions(pdev, bars, DRV_MODULE_NAME);

+3 -5

drivers/infiniband/hw/erdma/erdma_verbs.c

··· 12 12 #include <linux/vmalloc.h> 13 13 #include <net/addrconf.h> 14 14 #include <rdma/erdma-abi.h> 15 - #include <rdma/ib_umem.h> 15 + #include <rdma/iter.h> 16 16 #include <rdma/uverbs_ioctl.h> 17 17 18 18 #include "erdma.h" ··· 1039 1039 qp->attrs.rq_size = roundup_pow_of_two(attrs->cap.max_recv_wr); 1040 1040 1041 1041 if (uctx) { 1042 - ret = ib_copy_from_udata(&ureq, udata, 1043 - min(sizeof(ureq), udata->inlen)); 1042 + ret = ib_copy_validate_udata_in(udata, ureq, rsvd0); 1044 1043 if (ret) 1045 1044 goto err_out_xa; 1046 1045 ··· 1979 1980 struct erdma_ureq_create_cq ureq; 1980 1981 struct erdma_uresp_create_cq uresp; 1981 1982 1982 - ret = ib_copy_from_udata(&ureq, udata, 1983 - min(udata->inlen, sizeof(ureq))); 1983 + ret = ib_copy_validate_udata_in(udata, ureq, rsvd0); 1984 1984 if (ret) 1985 1985 goto err_out_xa; 1986 1986

+1 -3

drivers/infiniband/hw/hfi1/Makefile

··· 49 49 user_pages.o \ 50 50 user_sdma.o \ 51 51 verbs.o \ 52 - verbs_txreq.o \ 53 - vnic_main.o \ 54 - vnic_sdma.o 52 + verbs_txreq.o 55 53 56 54 ifdef CONFIG_DEBUG_FS 57 55 hfi1-y += debugfs.o

+1 -1

drivers/infiniband/hw/hfi1/aspm.c

··· 179 179 } 180 180 181 181 /* 182 - * Disable interrupt processing for verbs contexts when PSM or VNIC contexts 182 + * Disable interrupt processing for verbs contexts when PSM contexts 183 183 * are open. 184 184 */ 185 185 void aspm_disable_all(struct hfi1_devdata *dd)

+7 -47

drivers/infiniband/hw/hfi1/chip.c

··· 85 85 /* 86 86 * RSM instance allocation 87 87 * 0 - User Fecn Handling 88 - * 1 - Vnic 88 + * 1 - Deprecated 89 89 * 2 - AIP 90 90 * 3 - Verbs 91 91 */ 92 92 #define RSM_INS_FECN 0 93 - #define RSM_INS_VNIC 1 93 + #define RSM_INS_DEPRECATED 1 94 94 #define RSM_INS_AIP 2 95 95 #define RSM_INS_VERBS 3 96 96 ··· 151 151 #define DETH_AIP_SQPN_OFFSET(off) ((DETH_AIP_SQPN_QW << QW_SHIFT) | (off)) 152 152 #define DETH_AIP_SQPN_SELECT_OFFSET \ 153 153 DETH_AIP_SQPN_OFFSET(DETH_AIP_SQPN_BIT_OFFSET) 154 - 155 - /* RSM fields for Vnic */ 156 - /* L2_TYPE: QW 0, OFFSET 61 - for match */ 157 - #define L2_TYPE_QW 0ull 158 - #define L2_TYPE_BIT_OFFSET 61ull 159 - #define L2_TYPE_OFFSET(off) ((L2_TYPE_QW << QW_SHIFT) | (off)) 160 - #define L2_TYPE_MATCH_OFFSET L2_TYPE_OFFSET(L2_TYPE_BIT_OFFSET) 161 - #define L2_TYPE_MASK 3ull 162 - #define L2_16B_VALUE 2ull 163 154 164 155 /* L4_TYPE QW 1, OFFSET 0 - for match */ 165 156 #define L4_TYPE_QW 1ull ··· 6835 6844 for (i = 0; i < dd->num_rcv_contexts; i++) { 6836 6845 rcd = hfi1_rcd_get_by_index(dd, i); 6837 6846 6838 - /* Ensure all non-user contexts(including vnic) are enabled */ 6847 + /* Ensure all non-user contexts are enabled */ 6839 6848 if (!rcd || 6840 - (i >= dd->first_dyn_alloc_ctxt && !rcd->is_vnic)) { 6849 + (i >= dd->first_dyn_alloc_ctxt)) { 6841 6850 hfi1_rcd_put(rcd); 6842 6851 continue; 6843 6852 } ··· 8458 8467 return work_done; 8459 8468 } 8460 8469 8461 - /* Receive packet napi handler for netdevs VNIC and AIP */ 8470 + /* Receive packet napi handler for netdevs AIP */ 8462 8471 irqreturn_t receive_context_interrupt_napi(int irq, void *data) 8463 8472 { 8464 8473 struct hfi1_ctxtdata *rcd = data; ··· 14497 14506 int ctxt_count = hfi1_netdev_ctxt_count(dd); 14498 14507 14499 14508 /* We already have contexts mapped in RMT */ 14500 - if (has_rsm_rule(dd, RSM_INS_VNIC) || has_rsm_rule(dd, RSM_INS_AIP)) { 14509 + if (has_rsm_rule(dd, RSM_INS_AIP)) { 14501 14510 dd_dev_info(dd, "Contexts are already mapped in RMT\n"); 14502 14511 return true; 14503 14512 } ··· 14576 14585 14577 14586 hfi1_enable_rsm_rule(dd, RSM_INS_AIP, &rrd); 14578 14587 } 14579 - } 14580 - 14581 - /* Initialize RSM for VNIC */ 14582 - void hfi1_init_vnic_rsm(struct hfi1_devdata *dd) 14583 - { 14584 - int rmt_start = hfi1_netdev_get_free_rmt_idx(dd); 14585 - struct rsm_rule_data rrd = { 14586 - /* Add rule for vnic */ 14587 - .offset = rmt_start, 14588 - .pkt_type = 4, 14589 - /* Match 16B packets */ 14590 - .field1_off = L2_TYPE_MATCH_OFFSET, 14591 - .mask1 = L2_TYPE_MASK, 14592 - .value1 = L2_16B_VALUE, 14593 - /* Match ETH L4 packets */ 14594 - .field2_off = L4_TYPE_MATCH_OFFSET, 14595 - .mask2 = L4_16B_TYPE_MASK, 14596 - .value2 = L4_16B_ETH_VALUE, 14597 - /* Calc context from veswid and entropy */ 14598 - .index1_off = L4_16B_HDR_VESWID_OFFSET, 14599 - .index1_width = ilog2(NUM_NETDEV_MAP_ENTRIES), 14600 - .index2_off = L2_16B_ENTROPY_OFFSET, 14601 - .index2_width = ilog2(NUM_NETDEV_MAP_ENTRIES) 14602 - }; 14603 - 14604 - hfi1_enable_rsm_rule(dd, RSM_INS_VNIC, &rrd); 14605 - } 14606 - 14607 - void hfi1_deinit_vnic_rsm(struct hfi1_devdata *dd) 14608 - { 14609 - clear_rsm_rule(dd, RSM_INS_VNIC); 14610 14588 } 14611 14589 14612 14590 void hfi1_deinit_aip_rsm(struct hfi1_devdata *dd) ··· 15155 15195 (dd->revision >> CCE_REVISION_SW_SHIFT) 15156 15196 & CCE_REVISION_SW_MASK); 15157 15197 15158 - /* alloc VNIC/AIP rx data */ 15198 + /* alloc AIP rx data */ 15159 15199 ret = hfi1_alloc_rx(dd); 15160 15200 if (ret) 15161 15201 goto bail_cleanup;

-2

drivers/infiniband/hw/hfi1/chip.h

··· 1392 1392 u16 pkey); 1393 1393 int hfi1_clear_ctxt_pkey(struct hfi1_devdata *dd, struct hfi1_ctxtdata *ctxt); 1394 1394 void hfi1_read_link_quality(struct hfi1_devdata *dd, u8 *link_quality); 1395 - void hfi1_init_vnic_rsm(struct hfi1_devdata *dd); 1396 - void hfi1_deinit_vnic_rsm(struct hfi1_devdata *dd); 1397 1395 1398 1396 irqreturn_t general_interrupt(int irq, void *data); 1399 1397 irqreturn_t sdma_interrupt(int irq, void *data);

+6 -7

drivers/infiniband/hw/hfi1/driver.c

··· 20 20 #include "qp.h" 21 21 #include "sdma.h" 22 22 #include "debugfs.h" 23 - #include "vnic.h" 24 23 #include "fault.h" 25 24 26 25 #include "ipoib.h" ··· 908 909 u16 i; 909 910 910 911 /* 911 - * For dynamically allocated kernel contexts (like vnic) switch 912 + * For dynamically allocated kernel contexts switch 912 913 * interrupt handler only for that context. Otherwise, switch 913 914 * interrupt handler for all statically allocated kernel contexts. 914 915 */ 915 - if (rcd->ctxt >= dd->first_dyn_alloc_ctxt && !rcd->is_vnic) { 916 + if (rcd->ctxt >= dd->first_dyn_alloc_ctxt) { 916 917 hfi1_rcd_get(rcd); 917 918 hfi1_set_fast(rcd); 918 919 hfi1_rcd_put(rcd); ··· 921 922 922 923 for (i = HFI1_CTRL_CTXT + 1; i < dd->num_rcv_contexts; i++) { 923 924 rcd = hfi1_rcd_get_by_index(dd, i); 924 - if (rcd && (i < dd->first_dyn_alloc_ctxt || rcd->is_vnic)) 925 + if (rcd && (i < dd->first_dyn_alloc_ctxt)) 925 926 hfi1_set_fast(rcd); 926 927 hfi1_rcd_put(rcd); 927 928 } ··· 937 938 rcd = hfi1_rcd_get_by_index(dd, i); 938 939 if (!rcd) 939 940 continue; 940 - if (i < dd->first_dyn_alloc_ctxt || rcd->is_vnic) 941 + if (i < dd->first_dyn_alloc_ctxt) 941 942 rcd->do_interrupt = rcd->slow_handler; 942 943 943 944 hfi1_rcd_put(rcd); ··· 1399 1400 goto bail; 1400 1401 } 1401 1402 1402 - /* If there are any user/vnic contexts, we cannot reset */ 1403 + /* If there are any user contexts, we cannot reset */ 1403 1404 mutex_lock(&hfi1_mutex); 1404 1405 if (dd->rcd) 1405 1406 if (hfi1_stats.sps_ctxts) { ··· 1898 1899 [RHF_RCV_TYPE_EAGER] = process_receive_invalid, 1899 1900 [RHF_RCV_TYPE_IB] = hfi1_ipoib_ib_rcv, 1900 1901 [RHF_RCV_TYPE_ERROR] = process_receive_error, 1901 - [RHF_RCV_TYPE_BYPASS] = hfi1_vnic_bypass_rcv, 1902 + [RHF_RCV_TYPE_BYPASS] = process_receive_invalid, 1902 1903 [RHF_RCV_TYPE_INVALID5] = process_receive_invalid, 1903 1904 [RHF_RCV_TYPE_INVALID6] = process_receive_invalid, 1904 1905 [RHF_RCV_TYPE_INVALID7] = process_receive_invalid,

-20

drivers/infiniband/hw/hfi1/hfi.h

··· 212 212 u8 rhf_offset; 213 213 /* dynamic receive available interrupt timeout */ 214 214 u8 rcvavail_timeout; 215 - /* Indicates that this is vnic context */ 216 - bool is_vnic; 217 - /* vnic queue index this context is mapped to */ 218 - u8 vnic_q_idx; 219 215 /* Is ASPM interrupt supported for this context */ 220 216 bool aspm_intr_supported; 221 217 /* ASPM state (enabled/disabled) for this context */ ··· 398 402 #define OPA_16B_L4_FM 0x08 399 403 #define OPA_16B_L4_IB_LOCAL 0x09 400 404 #define OPA_16B_L4_IB_GLOBAL 0x0A 401 - #define OPA_16B_L4_ETHR OPA_VNIC_L4_ETHR 402 405 403 406 /* 404 407 * OPA 16B Management ··· 992 997 #define NUM_MAP_ENTRIES 256 993 998 #define NUM_MAP_REGS 32 994 999 995 - /* Virtual NIC information */ 996 - struct hfi1_vnic_data { 997 - struct kmem_cache *txreq_cache; 998 - u8 num_vports; 999 - }; 1000 - 1001 - struct hfi1_vnic_vport_info; 1002 - 1003 1000 /* device data struct now contains only "general per-device" info. 1004 1001 * fields related to a physical IB port are in a hfi1_pportdata struct. 1005 1002 */ ··· 1285 1298 send_routine process_dma_send; 1286 1299 void (*pio_inline_send)(struct hfi1_devdata *dd, struct pio_buf *pbuf, 1287 1300 u64 pbc, const void *from, size_t count); 1288 - int (*process_vnic_dma_send)(struct hfi1_devdata *dd, u8 q_idx, 1289 - struct hfi1_vnic_vport_info *vinfo, 1290 - struct sk_buff *skb, u64 pbc, u8 plen); 1291 1301 /* hfi1_pportdata, points to array of (physical) port-specific 1292 1302 * data structs, indexed by pidx (0..n-1) 1293 1303 */ ··· 1298 1314 u16 flags; 1299 1315 /* Number of physical ports available */ 1300 1316 u8 num_pports; 1301 - /* Lowest context number which can be used by user processes or VNIC */ 1302 1317 u8 first_dyn_alloc_ctxt; 1303 1318 /* adding a new field here would make it part of this cacheline */ 1304 1319 ··· 1336 1353 bool aspm_enabled; /* ASPM state: enabled/disabled */ 1337 1354 struct rhashtable *sdma_rht; 1338 1355 1339 - /* vnic data */ 1340 - struct hfi1_vnic_data vnic; 1341 1356 /* Lock to protect IRQ SRC register access */ 1342 1357 spinlock_t irq_src_lock; 1343 - int vnic_num_vports; 1344 1358 struct hfi1_netdev_rx *netdev_rx; 1345 1359 struct hfi1_affinity_node *affinity_entry; 1346 1360

+1 -3

drivers/infiniband/hw/hfi1/init.c

··· 26 26 #include "verbs.h" 27 27 #include "aspm.h" 28 28 #include "affinity.h" 29 - #include "vnic.h" 30 29 #include "exp_rcv.h" 31 30 #include "netdev.h" 32 31 ··· 348 349 * We do this here because we have to take into account all 349 350 * the RcvArray entries that previous context would have 350 351 * taken and we have to account for any extra groups assigned 351 - * to the static (kernel) or dynamic (vnic/user) contexts. 352 + * to the static (kernel) or dynamic (user) contexts. 352 353 */ 353 354 if (ctxt < dd->first_dyn_alloc_ctxt) { 354 355 if (ctxt < kctxt_ngroups) { ··· 850 851 dd->process_pio_send = hfi1_verbs_send_pio; 851 852 dd->process_dma_send = hfi1_verbs_send_dma; 852 853 dd->pio_inline_send = pio_copy; 853 - dd->process_vnic_dma_send = hfi1_vnic_send_dma; 854 854 855 855 if (is_ax(dd)) { 856 856 atomic_set(&dd->drop_packet, DROP_PACKET_ON);

-1

drivers/infiniband/hw/hfi1/mad.c

··· 12 12 #include "mad.h" 13 13 #include "trace.h" 14 14 #include "qp.h" 15 - #include "vnic.h" 16 15 17 16 /* the reset value from the FM is supposed to be 0xffff, handle both */ 18 17 #define OPA_LINK_WIDTH_RESET_OLD 0x0fff

+1 -3

drivers/infiniband/hw/hfi1/msix.c

··· 24 24 * one for the general, "slow path" interrupt 25 25 * one per used SDMA engine 26 26 * one per kernel receive context 27 - * one for each VNIC context 28 27 * ...any new IRQs should be added here. 29 28 */ 30 29 total = 1 + dd->num_sdma + dd->n_krcv_queues + dd->num_netdev_contexts; ··· 126 127 irq_handler_t thread, 127 128 const char *name) 128 129 { 129 - int nr = msix_request_irq(rcd->dd, rcd, handler, thread, 130 - rcd->is_vnic ? IRQ_NETDEVCTXT : IRQ_RCVCTXT, 130 + int nr = msix_request_irq(rcd->dd, rcd, handler, thread, IRQ_RCVCTXT, 131 131 name); 132 132 if (nr < 0) 133 133 return nr;

+2 -6

drivers/infiniband/hw/hfi1/netdev.h

··· 14 14 15 15 /** 16 16 * struct hfi1_netdev_rxq - Receive Queue for HFI 17 - * Both IPoIB and VNIC netdevices will be working on the rx abstraction. 17 + * IPoIB netdevices will be working on the rx abstraction. 18 18 * @napi: napi object 19 19 * @rx: ptr to netdev_rx 20 20 * @rcd: ptr to receive context data ··· 25 25 struct hfi1_ctxtdata *rcd; 26 26 }; 27 27 28 - /* 29 - * Number of netdev contexts used. Ensure it is less than or equal to 30 - * max queues supported by VNIC (HFI1_VNIC_MAX_QUEUE). 31 - */ 32 28 #define HFI1_MAX_NETDEV_CTXTS 8 33 29 34 30 /* Number of NETDEV RSM entries */ ··· 38 42 * @num_rx_q: number of receive queues 39 43 * @rmt_index: first free index in RMT Array 40 44 * @msix_start: first free MSI-X interrupt vector. 41 - * @dev_tbl: netdev table for unique identifier VNIC and IPoIb VLANs. 45 + * @dev_tbl: netdev table for unique identifier IPoIb VLANs. 42 46 * @enabled: atomic counter of netdevs enabling receive queues. 43 47 * When 0 NAPI will be disabled. 44 48 * @netdevs: atomic counter of netdevs using dummy netdev.

+1 -2

drivers/infiniband/hw/hfi1/netdev_rx.c

··· 78 78 uctxt->fast_handler = handle_receive_interrupt_napi_fp; 79 79 uctxt->slow_handler = handle_receive_interrupt_napi_sp; 80 80 hfi1_set_seq_cnt(uctxt, 1); 81 - uctxt->is_vnic = true; 82 81 83 82 hfi1_stats.sps_ctxts++; 84 83 ··· 426 427 427 428 /** 428 429 * hfi1_netdev_add_data - Registers data with unique identifier 429 - * to be requested later this is needed for VNIC and IPoIB VLANs 430 + * to be requested later this is needed for IPoIB VLANs 430 431 * implementations. 431 432 * This call is protected by mutex idr_lock. 432 433 *

-1

drivers/infiniband/hw/hfi1/qp.c

··· 404 404 hfi1_qp_schedule(qp); 405 405 } 406 406 spin_unlock_irqrestore(&qp->s_lock, flags); 407 - /* Notify hfi1_destroy_qp() if it is waiting. */ 408 407 rvt_put_qp(qp); 409 408 } 410 409

+3 -11

drivers/infiniband/hw/hfi1/user_exp_rcv.c

··· 257 257 if (tinfo->length == 0) 258 258 return -EINVAL; 259 259 260 - tidbuf = kzalloc(sizeof(*tidbuf), GFP_KERNEL); 260 + tidbuf = kzalloc_flex(*tidbuf, psets, uctxt->expected_count); 261 261 if (!tidbuf) 262 262 return -ENOMEM; 263 263 ··· 265 265 tidbuf->vaddr = tinfo->vaddr; 266 266 tidbuf->length = tinfo->length; 267 267 tidbuf->npages = num_user_pages(tidbuf->vaddr, tidbuf->length); 268 - tidbuf->psets = kcalloc(uctxt->expected_count, sizeof(*tidbuf->psets), 269 - GFP_KERNEL); 270 - if (!tidbuf->psets) { 271 - ret = -ENOMEM; 272 - goto fail_release_mem; 273 - } 274 268 275 269 if (fd->use_mn) { 276 270 ret = mmu_interval_notifier_insert( ··· 300 306 } 301 307 302 308 ngroups = pageset_count / dd->rcv_entries.group_size; 303 - tidlist = kcalloc(pageset_count, sizeof(*tidlist), GFP_KERNEL); 309 + tidlist = kzalloc_objs(*tidlist, pageset_count); 304 310 if (!tidlist) { 305 311 ret = -ENOMEM; 306 312 goto fail_unreserve; ··· 442 448 if (fd->use_mn) 443 449 mmu_interval_notifier_remove(&tidbuf->notifier); 444 450 kfree(tidbuf->pages); 445 - kfree(tidbuf->psets); 446 451 kfree(tidbuf); 447 452 kfree(tidlist); 448 453 return 0; ··· 464 471 unpin_rcv_pages(fd, tidbuf, NULL, 0, pinned, false); 465 472 fail_release_mem: 466 473 kfree(tidbuf->pages); 467 - kfree(tidbuf->psets); 468 474 kfree(tidbuf); 469 475 kfree(tidlist); 470 476 return ret; ··· 519 527 * for a long time. 520 528 * Copy the data to a local buffer so we can release the lock. 521 529 */ 522 - array = kcalloc(uctxt->expected_count, sizeof(*array), GFP_KERNEL); 530 + array = kzalloc_objs(*array, uctxt->expected_count); 523 531 if (!array) 524 532 return -EFAULT; 525 533

+1 -1

drivers/infiniband/hw/hfi1/user_exp_rcv.h

··· 22 22 unsigned long length; 23 23 unsigned int npages; 24 24 struct page **pages; 25 - struct tid_pageset *psets; 26 25 unsigned int n_psets; 26 + struct tid_pageset psets[]; 27 27 }; 28 28 29 29 struct tid_rb_node {

-2

drivers/infiniband/hw/hfi1/verbs.c

··· 21 21 #include "qp.h" 22 22 #include "verbs_txreq.h" 23 23 #include "debugfs.h" 24 - #include "vnic.h" 25 24 #include "fault.h" 26 25 #include "affinity.h" 27 26 #include "ipoib.h" ··· 1728 1729 1729 1730 .alloc_hw_device_stats = hfi1_alloc_hw_device_stats, 1730 1731 .alloc_hw_port_stats = hfi_alloc_hw_port_stats, 1731 - .alloc_rdma_netdev = hfi1_vnic_alloc_rn, 1732 1732 .device_group = &ib_hfi1_attr_group, 1733 1733 .get_dev_fw_str = hfi1_get_dev_fw_str, 1734 1734 .get_hw_stats = get_hw_stats,

-126

drivers/infiniband/hw/hfi1/vnic.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ 2 - /* 3 - * Copyright(c) 2017 - 2020 Intel Corporation. 4 - */ 5 - 6 - #ifndef _HFI1_VNIC_H 7 - #define _HFI1_VNIC_H 8 - #include <rdma/opa_vnic.h> 9 - #include "hfi.h" 10 - #include "sdma.h" 11 - 12 - #define HFI1_VNIC_MAX_TXQ 16 13 - #define HFI1_VNIC_MAX_PAD 12 14 - 15 - /* L4 header definitions */ 16 - #define HFI1_VNIC_L4_HDR_OFFSET OPA_VNIC_L2_HDR_LEN 17 - 18 - #define HFI1_VNIC_GET_L4_HDR(data) \ 19 - (*((u16 *)((u8 *)(data) + HFI1_VNIC_L4_HDR_OFFSET))) 20 - 21 - #define HFI1_VNIC_GET_VESWID(data) \ 22 - (HFI1_VNIC_GET_L4_HDR(data) & 0xFFF) 23 - 24 - /* Service class */ 25 - #define HFI1_VNIC_SC_OFFSET_LOW 6 26 - #define HFI1_VNIC_SC_OFFSET_HI 7 27 - #define HFI1_VNIC_SC_SHIFT 4 28 - 29 - #define HFI1_VNIC_MAX_QUEUE 16 30 - #define HFI1_NUM_VNIC_CTXT 8 31 - 32 - /** 33 - * struct hfi1_vnic_sdma - VNIC per Tx ring SDMA information 34 - * @dd - device data pointer 35 - * @sde - sdma engine 36 - * @vinfo - vnic info pointer 37 - * @wait - iowait structure 38 - * @stx - sdma tx request 39 - * @state - vnic Tx ring SDMA state 40 - * @q_idx - vnic Tx queue index 41 - */ 42 - struct hfi1_vnic_sdma { 43 - struct hfi1_devdata *dd; 44 - struct sdma_engine *sde; 45 - struct hfi1_vnic_vport_info *vinfo; 46 - struct iowait wait; 47 - struct sdma_txreq stx; 48 - unsigned int state; 49 - u8 q_idx; 50 - bool pkts_sent; 51 - }; 52 - 53 - /** 54 - * struct hfi1_vnic_rx_queue - HFI1 VNIC receive queue 55 - * @idx: queue index 56 - * @vinfo: pointer to vport information 57 - * @netdev: network device 58 - * @napi: netdev napi structure 59 - * @skbq: queue of received socket buffers 60 - */ 61 - struct hfi1_vnic_rx_queue { 62 - u8 idx; 63 - struct hfi1_vnic_vport_info *vinfo; 64 - struct net_device *netdev; 65 - struct napi_struct napi; 66 - }; 67 - 68 - /** 69 - * struct hfi1_vnic_vport_info - HFI1 VNIC virtual port information 70 - * @dd: device data pointer 71 - * @netdev: net device pointer 72 - * @flags: state flags 73 - * @lock: vport lock 74 - * @num_tx_q: number of transmit queues 75 - * @num_rx_q: number of receive queues 76 - * @vesw_id: virtual switch id 77 - * @rxq: Array of receive queues 78 - * @stats: per queue stats 79 - * @sdma: VNIC SDMA structure per TXQ 80 - */ 81 - struct hfi1_vnic_vport_info { 82 - struct hfi1_devdata *dd; 83 - struct net_device *netdev; 84 - unsigned long flags; 85 - 86 - /* Lock used around state updates */ 87 - struct mutex lock; 88 - 89 - u8 num_tx_q; 90 - u8 num_rx_q; 91 - u16 vesw_id; 92 - struct hfi1_vnic_rx_queue rxq[HFI1_NUM_VNIC_CTXT]; 93 - 94 - struct opa_vnic_stats stats[HFI1_VNIC_MAX_QUEUE]; 95 - struct hfi1_vnic_sdma sdma[HFI1_VNIC_MAX_TXQ]; 96 - }; 97 - 98 - #define v_dbg(format, arg...) \ 99 - netdev_dbg(vinfo->netdev, format, ## arg) 100 - #define v_err(format, arg...) \ 101 - netdev_err(vinfo->netdev, format, ## arg) 102 - #define v_info(format, arg...) \ 103 - netdev_info(vinfo->netdev, format, ## arg) 104 - 105 - /* vnic hfi1 internal functions */ 106 - void hfi1_vnic_setup(struct hfi1_devdata *dd); 107 - int hfi1_vnic_txreq_init(struct hfi1_devdata *dd); 108 - void hfi1_vnic_txreq_deinit(struct hfi1_devdata *dd); 109 - 110 - void hfi1_vnic_bypass_rcv(struct hfi1_packet *packet); 111 - void hfi1_vnic_sdma_init(struct hfi1_vnic_vport_info *vinfo); 112 - bool hfi1_vnic_sdma_write_avail(struct hfi1_vnic_vport_info *vinfo, 113 - u8 q_idx); 114 - 115 - /* vnic rdma netdev operations */ 116 - struct net_device *hfi1_vnic_alloc_rn(struct ib_device *device, 117 - u32 port_num, 118 - enum rdma_netdev_t type, 119 - const char *name, 120 - unsigned char name_assign_type, 121 - void (*setup)(struct net_device *)); 122 - int hfi1_vnic_send_dma(struct hfi1_devdata *dd, u8 q_idx, 123 - struct hfi1_vnic_vport_info *vinfo, 124 - struct sk_buff *skb, u64 pbc, u8 plen); 125 - 126 - #endif /* _HFI1_VNIC_H */

-615

drivers/infiniband/hw/hfi1/vnic_main.c

··· 1 - // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause 2 - /* 3 - * Copyright(c) 2017 - 2020 Intel Corporation. 4 - */ 5 - 6 - /* 7 - * This file contains HFI1 support for VNIC functionality 8 - */ 9 - 10 - #include <linux/io.h> 11 - #include <linux/if_vlan.h> 12 - 13 - #include "vnic.h" 14 - #include "netdev.h" 15 - 16 - #define HFI_TX_TIMEOUT_MS 1000 17 - 18 - #define HFI1_VNIC_RCV_Q_SIZE 1024 19 - 20 - #define HFI1_VNIC_UP 0 21 - 22 - static DEFINE_SPINLOCK(vport_cntr_lock); 23 - 24 - #define SUM_GRP_COUNTERS(stats, qstats, x_grp) do { \ 25 - u64 *src64, *dst64; \ 26 - for (src64 = &qstats->x_grp.unicast, \ 27 - dst64 = &stats->x_grp.unicast; \ 28 - dst64 <= &stats->x_grp.s_1519_max;) { \ 29 - *dst64++ += *src64++; \ 30 - } \ 31 - } while (0) 32 - 33 - #define VNIC_MASK (0xFF) 34 - #define VNIC_ID(val) ((1ull << 24) | ((val) & VNIC_MASK)) 35 - 36 - /* hfi1_vnic_update_stats - update statistics */ 37 - static void hfi1_vnic_update_stats(struct hfi1_vnic_vport_info *vinfo, 38 - struct opa_vnic_stats *stats) 39 - { 40 - struct net_device *netdev = vinfo->netdev; 41 - u8 i; 42 - 43 - /* add tx counters on different queues */ 44 - for (i = 0; i < vinfo->num_tx_q; i++) { 45 - struct opa_vnic_stats *qstats = &vinfo->stats[i]; 46 - struct rtnl_link_stats64 *qnstats = &vinfo->stats[i].netstats; 47 - 48 - stats->netstats.tx_fifo_errors += qnstats->tx_fifo_errors; 49 - stats->netstats.tx_carrier_errors += qnstats->tx_carrier_errors; 50 - stats->tx_drop_state += qstats->tx_drop_state; 51 - stats->tx_dlid_zero += qstats->tx_dlid_zero; 52 - 53 - SUM_GRP_COUNTERS(stats, qstats, tx_grp); 54 - stats->netstats.tx_packets += qnstats->tx_packets; 55 - stats->netstats.tx_bytes += qnstats->tx_bytes; 56 - } 57 - 58 - /* add rx counters on different queues */ 59 - for (i = 0; i < vinfo->num_rx_q; i++) { 60 - struct opa_vnic_stats *qstats = &vinfo->stats[i]; 61 - struct rtnl_link_stats64 *qnstats = &vinfo->stats[i].netstats; 62 - 63 - stats->netstats.rx_fifo_errors += qnstats->rx_fifo_errors; 64 - stats->netstats.rx_nohandler += qnstats->rx_nohandler; 65 - stats->rx_drop_state += qstats->rx_drop_state; 66 - stats->rx_oversize += qstats->rx_oversize; 67 - stats->rx_runt += qstats->rx_runt; 68 - 69 - SUM_GRP_COUNTERS(stats, qstats, rx_grp); 70 - stats->netstats.rx_packets += qnstats->rx_packets; 71 - stats->netstats.rx_bytes += qnstats->rx_bytes; 72 - } 73 - 74 - stats->netstats.tx_errors = stats->netstats.tx_fifo_errors + 75 - stats->netstats.tx_carrier_errors + 76 - stats->tx_drop_state + stats->tx_dlid_zero; 77 - stats->netstats.tx_dropped = stats->netstats.tx_errors; 78 - 79 - stats->netstats.rx_errors = stats->netstats.rx_fifo_errors + 80 - stats->netstats.rx_nohandler + 81 - stats->rx_drop_state + stats->rx_oversize + 82 - stats->rx_runt; 83 - stats->netstats.rx_dropped = stats->netstats.rx_errors; 84 - 85 - netdev->stats.tx_packets = stats->netstats.tx_packets; 86 - netdev->stats.tx_bytes = stats->netstats.tx_bytes; 87 - netdev->stats.tx_fifo_errors = stats->netstats.tx_fifo_errors; 88 - netdev->stats.tx_carrier_errors = stats->netstats.tx_carrier_errors; 89 - netdev->stats.tx_errors = stats->netstats.tx_errors; 90 - netdev->stats.tx_dropped = stats->netstats.tx_dropped; 91 - 92 - netdev->stats.rx_packets = stats->netstats.rx_packets; 93 - netdev->stats.rx_bytes = stats->netstats.rx_bytes; 94 - netdev->stats.rx_fifo_errors = stats->netstats.rx_fifo_errors; 95 - netdev->stats.multicast = stats->rx_grp.mcastbcast; 96 - netdev->stats.rx_length_errors = stats->rx_oversize + stats->rx_runt; 97 - netdev->stats.rx_errors = stats->netstats.rx_errors; 98 - netdev->stats.rx_dropped = stats->netstats.rx_dropped; 99 - } 100 - 101 - /* update_len_counters - update pkt's len histogram counters */ 102 - static inline void update_len_counters(struct opa_vnic_grp_stats *grp, 103 - int len) 104 - { 105 - /* account for 4 byte FCS */ 106 - if (len >= 1515) 107 - grp->s_1519_max++; 108 - else if (len >= 1020) 109 - grp->s_1024_1518++; 110 - else if (len >= 508) 111 - grp->s_512_1023++; 112 - else if (len >= 252) 113 - grp->s_256_511++; 114 - else if (len >= 124) 115 - grp->s_128_255++; 116 - else if (len >= 61) 117 - grp->s_65_127++; 118 - else 119 - grp->s_64++; 120 - } 121 - 122 - /* hfi1_vnic_update_tx_counters - update transmit counters */ 123 - static void hfi1_vnic_update_tx_counters(struct hfi1_vnic_vport_info *vinfo, 124 - u8 q_idx, struct sk_buff *skb, int err) 125 - { 126 - struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb); 127 - struct opa_vnic_stats *stats = &vinfo->stats[q_idx]; 128 - struct opa_vnic_grp_stats *tx_grp = &stats->tx_grp; 129 - u16 vlan_tci; 130 - 131 - stats->netstats.tx_packets++; 132 - stats->netstats.tx_bytes += skb->len + ETH_FCS_LEN; 133 - 134 - update_len_counters(tx_grp, skb->len); 135 - 136 - /* rest of the counts are for good packets only */ 137 - if (unlikely(err)) 138 - return; 139 - 140 - if (is_multicast_ether_addr(mac_hdr->h_dest)) 141 - tx_grp->mcastbcast++; 142 - else 143 - tx_grp->unicast++; 144 - 145 - if (!__vlan_get_tag(skb, &vlan_tci)) 146 - tx_grp->vlan++; 147 - else 148 - tx_grp->untagged++; 149 - } 150 - 151 - /* hfi1_vnic_update_rx_counters - update receive counters */ 152 - static void hfi1_vnic_update_rx_counters(struct hfi1_vnic_vport_info *vinfo, 153 - u8 q_idx, struct sk_buff *skb, int err) 154 - { 155 - struct ethhdr *mac_hdr = (struct ethhdr *)skb->data; 156 - struct opa_vnic_stats *stats = &vinfo->stats[q_idx]; 157 - struct opa_vnic_grp_stats *rx_grp = &stats->rx_grp; 158 - u16 vlan_tci; 159 - 160 - stats->netstats.rx_packets++; 161 - stats->netstats.rx_bytes += skb->len + ETH_FCS_LEN; 162 - 163 - update_len_counters(rx_grp, skb->len); 164 - 165 - /* rest of the counts are for good packets only */ 166 - if (unlikely(err)) 167 - return; 168 - 169 - if (is_multicast_ether_addr(mac_hdr->h_dest)) 170 - rx_grp->mcastbcast++; 171 - else 172 - rx_grp->unicast++; 173 - 174 - if (!__vlan_get_tag(skb, &vlan_tci)) 175 - rx_grp->vlan++; 176 - else 177 - rx_grp->untagged++; 178 - } 179 - 180 - /* This function is overloaded for opa_vnic specific implementation */ 181 - static void hfi1_vnic_get_stats64(struct net_device *netdev, 182 - struct rtnl_link_stats64 *stats) 183 - { 184 - struct opa_vnic_stats *vstats = (struct opa_vnic_stats *)stats; 185 - struct hfi1_vnic_vport_info *vinfo = opa_vnic_dev_priv(netdev); 186 - 187 - hfi1_vnic_update_stats(vinfo, vstats); 188 - } 189 - 190 - static u64 create_bypass_pbc(u32 vl, u32 dw_len) 191 - { 192 - u64 pbc; 193 - 194 - pbc = ((u64)PBC_IHCRC_NONE << PBC_INSERT_HCRC_SHIFT) 195 - | PBC_INSERT_BYPASS_ICRC | PBC_CREDIT_RETURN 196 - | PBC_PACKET_BYPASS 197 - | ((vl & PBC_VL_MASK) << PBC_VL_SHIFT) 198 - | (dw_len & PBC_LENGTH_DWS_MASK) << PBC_LENGTH_DWS_SHIFT; 199 - 200 - return pbc; 201 - } 202 - 203 - /* hfi1_vnic_maybe_stop_tx - stop tx queue if required */ 204 - static void hfi1_vnic_maybe_stop_tx(struct hfi1_vnic_vport_info *vinfo, 205 - u8 q_idx) 206 - { 207 - netif_stop_subqueue(vinfo->netdev, q_idx); 208 - if (!hfi1_vnic_sdma_write_avail(vinfo, q_idx)) 209 - return; 210 - 211 - netif_start_subqueue(vinfo->netdev, q_idx); 212 - } 213 - 214 - static netdev_tx_t hfi1_netdev_start_xmit(struct sk_buff *skb, 215 - struct net_device *netdev) 216 - { 217 - struct hfi1_vnic_vport_info *vinfo = opa_vnic_dev_priv(netdev); 218 - u8 pad_len, q_idx = skb->queue_mapping; 219 - struct hfi1_devdata *dd = vinfo->dd; 220 - struct opa_vnic_skb_mdata *mdata; 221 - u32 pkt_len, total_len; 222 - int err = -EINVAL; 223 - u64 pbc; 224 - 225 - v_dbg("xmit: queue %d skb len %d\n", q_idx, skb->len); 226 - if (unlikely(!netif_oper_up(netdev))) { 227 - vinfo->stats[q_idx].tx_drop_state++; 228 - goto tx_finish; 229 - } 230 - 231 - /* take out meta data */ 232 - mdata = (struct opa_vnic_skb_mdata *)skb->data; 233 - skb_pull(skb, sizeof(*mdata)); 234 - if (unlikely(mdata->flags & OPA_VNIC_SKB_MDATA_ENCAP_ERR)) { 235 - vinfo->stats[q_idx].tx_dlid_zero++; 236 - goto tx_finish; 237 - } 238 - 239 - /* add tail padding (for 8 bytes size alignment) and icrc */ 240 - pad_len = -(skb->len + OPA_VNIC_ICRC_TAIL_LEN) & 0x7; 241 - pad_len += OPA_VNIC_ICRC_TAIL_LEN; 242 - 243 - /* 244 - * pkt_len is how much data we have to write, includes header and data. 245 - * total_len is length of the packet in Dwords plus the PBC should not 246 - * include the CRC. 247 - */ 248 - pkt_len = (skb->len + pad_len) >> 2; 249 - total_len = pkt_len + 2; /* PBC + packet */ 250 - 251 - pbc = create_bypass_pbc(mdata->vl, total_len); 252 - 253 - skb_get(skb); 254 - v_dbg("pbc 0x%016llX len %d pad_len %d\n", pbc, skb->len, pad_len); 255 - err = dd->process_vnic_dma_send(dd, q_idx, vinfo, skb, pbc, pad_len); 256 - if (unlikely(err)) { 257 - if (err == -ENOMEM) 258 - vinfo->stats[q_idx].netstats.tx_fifo_errors++; 259 - else if (err != -EBUSY) 260 - vinfo->stats[q_idx].netstats.tx_carrier_errors++; 261 - } 262 - /* remove the header before updating tx counters */ 263 - skb_pull(skb, OPA_VNIC_HDR_LEN); 264 - 265 - if (unlikely(err == -EBUSY)) { 266 - hfi1_vnic_maybe_stop_tx(vinfo, q_idx); 267 - dev_kfree_skb_any(skb); 268 - return NETDEV_TX_BUSY; 269 - } 270 - 271 - tx_finish: 272 - /* update tx counters */ 273 - hfi1_vnic_update_tx_counters(vinfo, q_idx, skb, err); 274 - dev_kfree_skb_any(skb); 275 - return NETDEV_TX_OK; 276 - } 277 - 278 - static u16 hfi1_vnic_select_queue(struct net_device *netdev, 279 - struct sk_buff *skb, 280 - struct net_device *sb_dev) 281 - { 282 - struct hfi1_vnic_vport_info *vinfo = opa_vnic_dev_priv(netdev); 283 - struct opa_vnic_skb_mdata *mdata; 284 - struct sdma_engine *sde; 285 - 286 - mdata = (struct opa_vnic_skb_mdata *)skb->data; 287 - sde = sdma_select_engine_vl(vinfo->dd, mdata->entropy, mdata->vl); 288 - return sde->this_idx; 289 - } 290 - 291 - /* hfi1_vnic_decap_skb - strip OPA header from the skb (ethernet) packet */ 292 - static inline int hfi1_vnic_decap_skb(struct hfi1_vnic_rx_queue *rxq, 293 - struct sk_buff *skb) 294 - { 295 - struct hfi1_vnic_vport_info *vinfo = rxq->vinfo; 296 - int max_len = vinfo->netdev->mtu + VLAN_ETH_HLEN; 297 - int rc = -EFAULT; 298 - 299 - skb_pull(skb, OPA_VNIC_HDR_LEN); 300 - 301 - /* Validate Packet length */ 302 - if (unlikely(skb->len > max_len)) 303 - vinfo->stats[rxq->idx].rx_oversize++; 304 - else if (unlikely(skb->len < ETH_ZLEN)) 305 - vinfo->stats[rxq->idx].rx_runt++; 306 - else 307 - rc = 0; 308 - return rc; 309 - } 310 - 311 - static struct hfi1_vnic_vport_info *get_vnic_port(struct hfi1_devdata *dd, 312 - int vesw_id) 313 - { 314 - int vnic_id = VNIC_ID(vesw_id); 315 - 316 - return hfi1_netdev_get_data(dd, vnic_id); 317 - } 318 - 319 - static struct hfi1_vnic_vport_info *get_first_vnic_port(struct hfi1_devdata *dd) 320 - { 321 - struct hfi1_vnic_vport_info *vinfo; 322 - int next_id = VNIC_ID(0); 323 - 324 - vinfo = hfi1_netdev_get_first_data(dd, &next_id); 325 - 326 - if (next_id > VNIC_ID(VNIC_MASK)) 327 - return NULL; 328 - 329 - return vinfo; 330 - } 331 - 332 - void hfi1_vnic_bypass_rcv(struct hfi1_packet *packet) 333 - { 334 - struct hfi1_devdata *dd = packet->rcd->dd; 335 - struct hfi1_vnic_vport_info *vinfo = NULL; 336 - struct hfi1_vnic_rx_queue *rxq; 337 - struct sk_buff *skb; 338 - int l4_type, vesw_id = -1, rc; 339 - u8 q_idx; 340 - unsigned char *pad_info; 341 - 342 - l4_type = hfi1_16B_get_l4(packet->ebuf); 343 - if (likely(l4_type == OPA_16B_L4_ETHR)) { 344 - vesw_id = HFI1_VNIC_GET_VESWID(packet->ebuf); 345 - vinfo = get_vnic_port(dd, vesw_id); 346 - 347 - /* 348 - * In case of invalid vesw id, count the error on 349 - * the first available vport. 350 - */ 351 - if (unlikely(!vinfo)) { 352 - struct hfi1_vnic_vport_info *vinfo_tmp; 353 - 354 - vinfo_tmp = get_first_vnic_port(dd); 355 - if (vinfo_tmp) { 356 - spin_lock(&vport_cntr_lock); 357 - vinfo_tmp->stats[0].netstats.rx_nohandler++; 358 - spin_unlock(&vport_cntr_lock); 359 - } 360 - } 361 - } 362 - 363 - if (unlikely(!vinfo)) { 364 - dd_dev_warn(dd, "vnic rcv err: l4 %d vesw id %d ctx %d\n", 365 - l4_type, vesw_id, packet->rcd->ctxt); 366 - return; 367 - } 368 - 369 - q_idx = packet->rcd->vnic_q_idx; 370 - rxq = &vinfo->rxq[q_idx]; 371 - if (unlikely(!netif_oper_up(vinfo->netdev))) { 372 - vinfo->stats[q_idx].rx_drop_state++; 373 - return; 374 - } 375 - 376 - skb = netdev_alloc_skb(vinfo->netdev, packet->tlen); 377 - if (unlikely(!skb)) { 378 - vinfo->stats[q_idx].netstats.rx_fifo_errors++; 379 - return; 380 - } 381 - 382 - memcpy(skb->data, packet->ebuf, packet->tlen); 383 - skb_put(skb, packet->tlen); 384 - 385 - pad_info = skb->data + skb->len - 1; 386 - skb_trim(skb, (skb->len - OPA_VNIC_ICRC_TAIL_LEN - 387 - ((*pad_info) & 0x7))); 388 - 389 - rc = hfi1_vnic_decap_skb(rxq, skb); 390 - 391 - /* update rx counters */ 392 - hfi1_vnic_update_rx_counters(vinfo, rxq->idx, skb, rc); 393 - if (unlikely(rc)) { 394 - dev_kfree_skb_any(skb); 395 - return; 396 - } 397 - 398 - skb_checksum_none_assert(skb); 399 - skb->protocol = eth_type_trans(skb, rxq->netdev); 400 - 401 - napi_gro_receive(&rxq->napi, skb); 402 - } 403 - 404 - static int hfi1_vnic_up(struct hfi1_vnic_vport_info *vinfo) 405 - { 406 - struct hfi1_devdata *dd = vinfo->dd; 407 - struct net_device *netdev = vinfo->netdev; 408 - int rc; 409 - 410 - /* ensure virtual eth switch id is valid */ 411 - if (!vinfo->vesw_id) 412 - return -EINVAL; 413 - 414 - rc = hfi1_netdev_add_data(dd, VNIC_ID(vinfo->vesw_id), vinfo); 415 - if (rc < 0) 416 - return rc; 417 - 418 - rc = hfi1_netdev_rx_init(dd); 419 - if (rc) 420 - goto err_remove; 421 - 422 - netif_carrier_on(netdev); 423 - netif_tx_start_all_queues(netdev); 424 - set_bit(HFI1_VNIC_UP, &vinfo->flags); 425 - 426 - return 0; 427 - 428 - err_remove: 429 - hfi1_netdev_remove_data(dd, VNIC_ID(vinfo->vesw_id)); 430 - return rc; 431 - } 432 - 433 - static void hfi1_vnic_down(struct hfi1_vnic_vport_info *vinfo) 434 - { 435 - struct hfi1_devdata *dd = vinfo->dd; 436 - 437 - clear_bit(HFI1_VNIC_UP, &vinfo->flags); 438 - netif_carrier_off(vinfo->netdev); 439 - netif_tx_disable(vinfo->netdev); 440 - hfi1_netdev_remove_data(dd, VNIC_ID(vinfo->vesw_id)); 441 - 442 - hfi1_netdev_rx_destroy(dd); 443 - } 444 - 445 - static int hfi1_netdev_open(struct net_device *netdev) 446 - { 447 - struct hfi1_vnic_vport_info *vinfo = opa_vnic_dev_priv(netdev); 448 - int rc; 449 - 450 - mutex_lock(&vinfo->lock); 451 - rc = hfi1_vnic_up(vinfo); 452 - mutex_unlock(&vinfo->lock); 453 - return rc; 454 - } 455 - 456 - static int hfi1_netdev_close(struct net_device *netdev) 457 - { 458 - struct hfi1_vnic_vport_info *vinfo = opa_vnic_dev_priv(netdev); 459 - 460 - mutex_lock(&vinfo->lock); 461 - if (test_bit(HFI1_VNIC_UP, &vinfo->flags)) 462 - hfi1_vnic_down(vinfo); 463 - mutex_unlock(&vinfo->lock); 464 - return 0; 465 - } 466 - 467 - static int hfi1_vnic_init(struct hfi1_vnic_vport_info *vinfo) 468 - { 469 - struct hfi1_devdata *dd = vinfo->dd; 470 - int rc = 0; 471 - 472 - mutex_lock(&hfi1_mutex); 473 - if (!dd->vnic_num_vports) { 474 - rc = hfi1_vnic_txreq_init(dd); 475 - if (rc) 476 - goto txreq_fail; 477 - } 478 - 479 - rc = hfi1_netdev_rx_init(dd); 480 - if (rc) { 481 - dd_dev_err(dd, "Unable to initialize netdev contexts\n"); 482 - goto alloc_fail; 483 - } 484 - 485 - hfi1_init_vnic_rsm(dd); 486 - 487 - dd->vnic_num_vports++; 488 - hfi1_vnic_sdma_init(vinfo); 489 - 490 - alloc_fail: 491 - if (!dd->vnic_num_vports) 492 - hfi1_vnic_txreq_deinit(dd); 493 - txreq_fail: 494 - mutex_unlock(&hfi1_mutex); 495 - return rc; 496 - } 497 - 498 - static void hfi1_vnic_deinit(struct hfi1_vnic_vport_info *vinfo) 499 - { 500 - struct hfi1_devdata *dd = vinfo->dd; 501 - 502 - mutex_lock(&hfi1_mutex); 503 - if (--dd->vnic_num_vports == 0) { 504 - hfi1_deinit_vnic_rsm(dd); 505 - hfi1_vnic_txreq_deinit(dd); 506 - } 507 - mutex_unlock(&hfi1_mutex); 508 - hfi1_netdev_rx_destroy(dd); 509 - } 510 - 511 - static void hfi1_vnic_set_vesw_id(struct net_device *netdev, int id) 512 - { 513 - struct hfi1_vnic_vport_info *vinfo = opa_vnic_dev_priv(netdev); 514 - bool reopen = false; 515 - 516 - /* 517 - * If vesw_id is being changed, and if the vnic port is up, 518 - * reset the vnic port to ensure new vesw_id gets picked up 519 - */ 520 - if (id != vinfo->vesw_id) { 521 - mutex_lock(&vinfo->lock); 522 - if (test_bit(HFI1_VNIC_UP, &vinfo->flags)) { 523 - hfi1_vnic_down(vinfo); 524 - reopen = true; 525 - } 526 - 527 - vinfo->vesw_id = id; 528 - if (reopen) 529 - hfi1_vnic_up(vinfo); 530 - 531 - mutex_unlock(&vinfo->lock); 532 - } 533 - } 534 - 535 - /* netdev ops */ 536 - static const struct net_device_ops hfi1_netdev_ops = { 537 - .ndo_open = hfi1_netdev_open, 538 - .ndo_stop = hfi1_netdev_close, 539 - .ndo_start_xmit = hfi1_netdev_start_xmit, 540 - .ndo_select_queue = hfi1_vnic_select_queue, 541 - .ndo_get_stats64 = hfi1_vnic_get_stats64, 542 - }; 543 - 544 - static void hfi1_vnic_free_rn(struct net_device *netdev) 545 - { 546 - struct hfi1_vnic_vport_info *vinfo = opa_vnic_dev_priv(netdev); 547 - 548 - hfi1_vnic_deinit(vinfo); 549 - mutex_destroy(&vinfo->lock); 550 - free_netdev(netdev); 551 - } 552 - 553 - struct net_device *hfi1_vnic_alloc_rn(struct ib_device *device, 554 - u32 port_num, 555 - enum rdma_netdev_t type, 556 - const char *name, 557 - unsigned char name_assign_type, 558 - void (*setup)(struct net_device *)) 559 - { 560 - struct hfi1_devdata *dd = dd_from_ibdev(device); 561 - struct hfi1_vnic_vport_info *vinfo; 562 - struct net_device *netdev; 563 - struct rdma_netdev *rn; 564 - int i, size, rc; 565 - 566 - if (!dd->num_netdev_contexts) 567 - return ERR_PTR(-ENOMEM); 568 - 569 - if (!port_num || (port_num > dd->num_pports)) 570 - return ERR_PTR(-EINVAL); 571 - 572 - if (type != RDMA_NETDEV_OPA_VNIC) 573 - return ERR_PTR(-EOPNOTSUPP); 574 - 575 - size = sizeof(struct opa_vnic_rdma_netdev) + sizeof(*vinfo); 576 - netdev = alloc_netdev_mqs(size, name, name_assign_type, setup, 577 - chip_sdma_engines(dd), 578 - dd->num_netdev_contexts); 579 - if (!netdev) 580 - return ERR_PTR(-ENOMEM); 581 - 582 - rn = netdev_priv(netdev); 583 - vinfo = opa_vnic_dev_priv(netdev); 584 - vinfo->dd = dd; 585 - vinfo->num_tx_q = chip_sdma_engines(dd); 586 - vinfo->num_rx_q = dd->num_netdev_contexts; 587 - vinfo->netdev = netdev; 588 - rn->free_rdma_netdev = hfi1_vnic_free_rn; 589 - rn->set_id = hfi1_vnic_set_vesw_id; 590 - 591 - netdev->features = NETIF_F_HIGHDMA | NETIF_F_SG; 592 - netdev->hw_features = netdev->features; 593 - netdev->vlan_features = netdev->features; 594 - netdev->watchdog_timeo = msecs_to_jiffies(HFI_TX_TIMEOUT_MS); 595 - netdev->netdev_ops = &hfi1_netdev_ops; 596 - mutex_init(&vinfo->lock); 597 - 598 - for (i = 0; i < vinfo->num_rx_q; i++) { 599 - struct hfi1_vnic_rx_queue *rxq = &vinfo->rxq[i]; 600 - 601 - rxq->idx = i; 602 - rxq->vinfo = vinfo; 603 - rxq->netdev = netdev; 604 - } 605 - 606 - rc = hfi1_vnic_init(vinfo); 607 - if (rc) 608 - goto init_fail; 609 - 610 - return netdev; 611 - init_fail: 612 - mutex_destroy(&vinfo->lock); 613 - free_netdev(netdev); 614 - return ERR_PTR(rc); 615 - }

-282

drivers/infiniband/hw/hfi1/vnic_sdma.c

··· 1 - // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause 2 - /* 3 - * Copyright(c) 2017 - 2018 Intel Corporation. 4 - */ 5 - 6 - /* 7 - * This file contains HFI1 support for VNIC SDMA functionality 8 - */ 9 - 10 - #include "sdma.h" 11 - #include "vnic.h" 12 - 13 - #define HFI1_VNIC_SDMA_Q_ACTIVE BIT(0) 14 - #define HFI1_VNIC_SDMA_Q_DEFERRED BIT(1) 15 - 16 - #define HFI1_VNIC_TXREQ_NAME_LEN 32 17 - #define HFI1_VNIC_SDMA_DESC_WTRMRK 64 18 - 19 - /* 20 - * struct vnic_txreq - VNIC transmit descriptor 21 - * @txreq: sdma transmit request 22 - * @sdma: vnic sdma pointer 23 - * @skb: skb to send 24 - * @pad: pad buffer 25 - * @plen: pad length 26 - * @pbc_val: pbc value 27 - */ 28 - struct vnic_txreq { 29 - struct sdma_txreq txreq; 30 - struct hfi1_vnic_sdma *sdma; 31 - 32 - struct sk_buff *skb; 33 - unsigned char pad[HFI1_VNIC_MAX_PAD]; 34 - u16 plen; 35 - __le64 pbc_val; 36 - }; 37 - 38 - static void vnic_sdma_complete(struct sdma_txreq *txreq, 39 - int status) 40 - { 41 - struct vnic_txreq *tx = container_of(txreq, struct vnic_txreq, txreq); 42 - struct hfi1_vnic_sdma *vnic_sdma = tx->sdma; 43 - 44 - sdma_txclean(vnic_sdma->dd, txreq); 45 - dev_kfree_skb_any(tx->skb); 46 - kmem_cache_free(vnic_sdma->dd->vnic.txreq_cache, tx); 47 - } 48 - 49 - static noinline int build_vnic_ulp_payload(struct sdma_engine *sde, 50 - struct vnic_txreq *tx) 51 - { 52 - int i, ret = 0; 53 - 54 - ret = sdma_txadd_kvaddr( 55 - sde->dd, 56 - &tx->txreq, 57 - tx->skb->data, 58 - skb_headlen(tx->skb)); 59 - if (unlikely(ret)) 60 - goto bail_txadd; 61 - 62 - for (i = 0; i < skb_shinfo(tx->skb)->nr_frags; i++) { 63 - skb_frag_t *frag = &skb_shinfo(tx->skb)->frags[i]; 64 - 65 - /* combine physically continuous fragments later? */ 66 - ret = sdma_txadd_page(sde->dd, 67 - &tx->txreq, 68 - skb_frag_page(frag), 69 - skb_frag_off(frag), 70 - skb_frag_size(frag), 71 - NULL, NULL, NULL); 72 - if (unlikely(ret)) 73 - goto bail_txadd; 74 - } 75 - 76 - if (tx->plen) 77 - ret = sdma_txadd_kvaddr(sde->dd, &tx->txreq, 78 - tx->pad + HFI1_VNIC_MAX_PAD - tx->plen, 79 - tx->plen); 80 - 81 - bail_txadd: 82 - return ret; 83 - } 84 - 85 - static int build_vnic_tx_desc(struct sdma_engine *sde, 86 - struct vnic_txreq *tx, 87 - u64 pbc) 88 - { 89 - int ret = 0; 90 - u16 hdrbytes = 2 << 2; /* PBC */ 91 - 92 - ret = sdma_txinit_ahg( 93 - &tx->txreq, 94 - 0, 95 - hdrbytes + tx->skb->len + tx->plen, 96 - 0, 97 - 0, 98 - NULL, 99 - 0, 100 - vnic_sdma_complete); 101 - if (unlikely(ret)) 102 - goto bail_txadd; 103 - 104 - /* add pbc */ 105 - tx->pbc_val = cpu_to_le64(pbc); 106 - ret = sdma_txadd_kvaddr( 107 - sde->dd, 108 - &tx->txreq, 109 - &tx->pbc_val, 110 - hdrbytes); 111 - if (unlikely(ret)) 112 - goto bail_txadd; 113 - 114 - /* add the ulp payload */ 115 - ret = build_vnic_ulp_payload(sde, tx); 116 - bail_txadd: 117 - return ret; 118 - } 119 - 120 - /* setup the last plen bypes of pad */ 121 - static inline void hfi1_vnic_update_pad(unsigned char *pad, u8 plen) 122 - { 123 - pad[HFI1_VNIC_MAX_PAD - 1] = plen - OPA_VNIC_ICRC_TAIL_LEN; 124 - } 125 - 126 - int hfi1_vnic_send_dma(struct hfi1_devdata *dd, u8 q_idx, 127 - struct hfi1_vnic_vport_info *vinfo, 128 - struct sk_buff *skb, u64 pbc, u8 plen) 129 - { 130 - struct hfi1_vnic_sdma *vnic_sdma = &vinfo->sdma[q_idx]; 131 - struct sdma_engine *sde = vnic_sdma->sde; 132 - struct vnic_txreq *tx; 133 - int ret = -ECOMM; 134 - 135 - if (unlikely(READ_ONCE(vnic_sdma->state) != HFI1_VNIC_SDMA_Q_ACTIVE)) 136 - goto tx_err; 137 - 138 - if (unlikely(!sde || !sdma_running(sde))) 139 - goto tx_err; 140 - 141 - tx = kmem_cache_alloc(dd->vnic.txreq_cache, GFP_ATOMIC); 142 - if (unlikely(!tx)) { 143 - ret = -ENOMEM; 144 - goto tx_err; 145 - } 146 - 147 - tx->sdma = vnic_sdma; 148 - tx->skb = skb; 149 - hfi1_vnic_update_pad(tx->pad, plen); 150 - tx->plen = plen; 151 - ret = build_vnic_tx_desc(sde, tx, pbc); 152 - if (unlikely(ret)) 153 - goto free_desc; 154 - 155 - ret = sdma_send_txreq(sde, iowait_get_ib_work(&vnic_sdma->wait), 156 - &tx->txreq, vnic_sdma->pkts_sent); 157 - /* When -ECOMM, sdma callback will be called with ABORT status */ 158 - if (unlikely(ret && unlikely(ret != -ECOMM))) 159 - goto free_desc; 160 - 161 - if (!ret) { 162 - vnic_sdma->pkts_sent = true; 163 - iowait_starve_clear(vnic_sdma->pkts_sent, &vnic_sdma->wait); 164 - } 165 - return ret; 166 - 167 - free_desc: 168 - sdma_txclean(dd, &tx->txreq); 169 - kmem_cache_free(dd->vnic.txreq_cache, tx); 170 - tx_err: 171 - if (ret != -EBUSY) 172 - dev_kfree_skb_any(skb); 173 - else 174 - vnic_sdma->pkts_sent = false; 175 - return ret; 176 - } 177 - 178 - /* 179 - * hfi1_vnic_sdma_sleep - vnic sdma sleep function 180 - * 181 - * This function gets called from sdma_send_txreq() when there are not enough 182 - * sdma descriptors available to send the packet. It adds Tx queue's wait 183 - * structure to sdma engine's dmawait list to be woken up when descriptors 184 - * become available. 185 - */ 186 - static int hfi1_vnic_sdma_sleep(struct sdma_engine *sde, 187 - struct iowait_work *wait, 188 - struct sdma_txreq *txreq, 189 - uint seq, 190 - bool pkts_sent) 191 - { 192 - struct hfi1_vnic_sdma *vnic_sdma = 193 - container_of(wait->iow, struct hfi1_vnic_sdma, wait); 194 - 195 - write_seqlock(&sde->waitlock); 196 - if (sdma_progress(sde, seq, txreq)) { 197 - write_sequnlock(&sde->waitlock); 198 - return -EAGAIN; 199 - } 200 - 201 - vnic_sdma->state = HFI1_VNIC_SDMA_Q_DEFERRED; 202 - if (list_empty(&vnic_sdma->wait.list)) { 203 - iowait_get_priority(wait->iow); 204 - iowait_queue(pkts_sent, wait->iow, &sde->dmawait); 205 - } 206 - write_sequnlock(&sde->waitlock); 207 - return -EBUSY; 208 - } 209 - 210 - /* 211 - * hfi1_vnic_sdma_wakeup - vnic sdma wakeup function 212 - * 213 - * This function gets called when SDMA descriptors becomes available and Tx 214 - * queue's wait structure was previously added to sdma engine's dmawait list. 215 - * It notifies the upper driver about Tx queue wakeup. 216 - */ 217 - static void hfi1_vnic_sdma_wakeup(struct iowait *wait, int reason) 218 - { 219 - struct hfi1_vnic_sdma *vnic_sdma = 220 - container_of(wait, struct hfi1_vnic_sdma, wait); 221 - struct hfi1_vnic_vport_info *vinfo = vnic_sdma->vinfo; 222 - 223 - vnic_sdma->state = HFI1_VNIC_SDMA_Q_ACTIVE; 224 - if (__netif_subqueue_stopped(vinfo->netdev, vnic_sdma->q_idx)) 225 - netif_wake_subqueue(vinfo->netdev, vnic_sdma->q_idx); 226 - }; 227 - 228 - inline bool hfi1_vnic_sdma_write_avail(struct hfi1_vnic_vport_info *vinfo, 229 - u8 q_idx) 230 - { 231 - struct hfi1_vnic_sdma *vnic_sdma = &vinfo->sdma[q_idx]; 232 - 233 - return (READ_ONCE(vnic_sdma->state) == HFI1_VNIC_SDMA_Q_ACTIVE); 234 - } 235 - 236 - void hfi1_vnic_sdma_init(struct hfi1_vnic_vport_info *vinfo) 237 - { 238 - int i; 239 - 240 - for (i = 0; i < vinfo->num_tx_q; i++) { 241 - struct hfi1_vnic_sdma *vnic_sdma = &vinfo->sdma[i]; 242 - 243 - iowait_init(&vnic_sdma->wait, 0, NULL, NULL, 244 - hfi1_vnic_sdma_sleep, 245 - hfi1_vnic_sdma_wakeup, NULL, NULL); 246 - vnic_sdma->sde = &vinfo->dd->per_sdma[i]; 247 - vnic_sdma->dd = vinfo->dd; 248 - vnic_sdma->vinfo = vinfo; 249 - vnic_sdma->q_idx = i; 250 - vnic_sdma->state = HFI1_VNIC_SDMA_Q_ACTIVE; 251 - 252 - /* Add a free descriptor watermark for wakeups */ 253 - if (vnic_sdma->sde->descq_cnt > HFI1_VNIC_SDMA_DESC_WTRMRK) { 254 - struct iowait_work *work; 255 - 256 - INIT_LIST_HEAD(&vnic_sdma->stx.list); 257 - vnic_sdma->stx.num_desc = HFI1_VNIC_SDMA_DESC_WTRMRK; 258 - work = iowait_get_ib_work(&vnic_sdma->wait); 259 - list_add_tail(&vnic_sdma->stx.list, &work->tx_head); 260 - } 261 - } 262 - } 263 - 264 - int hfi1_vnic_txreq_init(struct hfi1_devdata *dd) 265 - { 266 - char buf[HFI1_VNIC_TXREQ_NAME_LEN]; 267 - 268 - snprintf(buf, sizeof(buf), "hfi1_%u_vnic_txreq_cache", dd->unit); 269 - dd->vnic.txreq_cache = kmem_cache_create(buf, 270 - sizeof(struct vnic_txreq), 271 - 0, SLAB_HWCACHE_ALIGN, 272 - NULL); 273 - if (!dd->vnic.txreq_cache) 274 - return -ENOMEM; 275 - return 0; 276 - } 277 - 278 - void hfi1_vnic_txreq_deinit(struct hfi1_devdata *dd) 279 - { 280 - kmem_cache_destroy(dd->vnic.txreq_cache); 281 - dd->vnic.txreq_cache = NULL; 282 - }

+1 -1

drivers/infiniband/hw/hns/hns_roce_alloc.c

··· 32 32 */ 33 33 34 34 #include <linux/vmalloc.h> 35 - #include <rdma/ib_umem.h> 35 + #include <rdma/iter.h> 36 36 #include "hns_roce_device.h" 37 37 38 38 void hns_roce_buf_free(struct hns_roce_dev *hr_dev, struct hns_roce_buf *buf)

+1 -15

drivers/infiniband/hw/hns/hns_roce_cq.c

··· 350 350 return 0; 351 351 } 352 352 353 - static int get_cq_ucmd(struct hns_roce_cq *hr_cq, struct ib_udata *udata, 354 - struct hns_roce_ib_create_cq *ucmd) 355 - { 356 - struct ib_device *ibdev = hr_cq->ib_cq.device; 357 - int ret; 358 - 359 - ret = ib_copy_from_udata(ucmd, udata, min(udata->inlen, sizeof(*ucmd))); 360 - if (ret) { 361 - ibdev_err(ibdev, "failed to copy CQ udata, ret = %d.\n", ret); 362 - return ret; 363 - } 364 - 365 - return 0; 366 - } 367 353 368 354 static void set_cq_param(struct hns_roce_cq *hr_cq, u32 cq_entries, int vector, 369 355 struct hns_roce_ib_create_cq *ucmd) ··· 414 428 goto err_out; 415 429 416 430 if (udata) { 417 - ret = get_cq_ucmd(hr_cq, udata, &ucmd); 431 + ret = ib_copy_validate_udata_in(udata, ucmd, db_addr); 418 432 if (ret) 419 433 goto err_out; 420 434 }

+2 -5

drivers/infiniband/hw/hns/hns_roce_hem.c

··· 771 771 unsigned long num_bt_l1; 772 772 773 773 num_bt_l1 = DIV_ROUND_UP(num_hem, bt_chunk_num); 774 - table->bt_l1 = kcalloc(num_bt_l1, 775 - sizeof(*table->bt_l1), 776 - GFP_KERNEL); 774 + table->bt_l1 = kzalloc_objs(*table->bt_l1, num_bt_l1); 777 775 if (!table->bt_l1) 778 776 goto err_kcalloc_bt_l1; 779 777 ··· 784 786 785 787 if (check_whether_bt_num_2(type, hop_num) || 786 788 check_whether_bt_num_3(type, hop_num)) { 787 - table->bt_l0 = kcalloc(num_bt_l0, sizeof(*table->bt_l0), 788 - GFP_KERNEL); 789 + table->bt_l0 = kzalloc_objs(*table->bt_l0, num_bt_l0); 789 790 if (!table->bt_l0) 790 791 goto err_kcalloc_bt_l0; 791 792

+3 -3

drivers/infiniband/hw/hns/hns_roce_main.c

··· 36 36 #include <rdma/ib_smi.h> 37 37 #include <rdma/ib_user_verbs.h> 38 38 #include <rdma/ib_cache.h> 39 + #include <rdma/uverbs_ioctl.h> 39 40 #include "hns_roce_common.h" 40 41 #include "hns_roce_device.h" 41 42 #include "hns_roce_hem.h" ··· 425 424 struct hns_roce_ucontext *context = to_hr_ucontext(uctx); 426 425 struct hns_roce_dev *hr_dev = to_hr_dev(uctx->device); 427 426 struct hns_roce_ib_alloc_ucontext_resp resp = {}; 428 - struct hns_roce_ib_alloc_ucontext ucmd = {}; 427 + struct hns_roce_ib_alloc_ucontext ucmd; 429 428 int ret = -EAGAIN; 430 429 431 430 if (!hr_dev->active) ··· 434 433 resp.qp_tab_size = hr_dev->caps.num_qps; 435 434 resp.srq_tab_size = hr_dev->caps.num_srqs; 436 435 437 - ret = ib_copy_from_udata(&ucmd, udata, 438 - min(udata->inlen, sizeof(ucmd))); 436 + ret = ib_copy_validate_udata_in(udata, ucmd, reserved); 439 437 if (ret) 440 438 goto error_out; 441 439

+8 -15

drivers/infiniband/hw/hns/hns_roce_qp.c

··· 1023 1023 static int alloc_kernel_wrid(struct hns_roce_dev *hr_dev, 1024 1024 struct hns_roce_qp *hr_qp) 1025 1025 { 1026 - struct ib_device *ibdev = &hr_dev->ib_dev; 1027 - u64 *sq_wrid = NULL; 1028 - u64 *rq_wrid = NULL; 1026 + u64 *sq_wrid, *rq_wrid = NULL; 1029 1027 int ret; 1030 1028 1031 - sq_wrid = kcalloc(hr_qp->sq.wqe_cnt, sizeof(u64), GFP_KERNEL); 1032 - if (!sq_wrid) { 1033 - ibdev_err(ibdev, "failed to alloc SQ wrid.\n"); 1029 + sq_wrid = kzalloc_objs(*sq_wrid, hr_qp->sq.wqe_cnt); 1030 + if (!sq_wrid) 1034 1031 return -ENOMEM; 1035 - } 1036 1032 1037 1033 if (hr_qp->rq.wqe_cnt) { 1038 - rq_wrid = kcalloc(hr_qp->rq.wqe_cnt, sizeof(u64), GFP_KERNEL); 1034 + rq_wrid = kzalloc_objs(*rq_wrid, hr_qp->rq.wqe_cnt); 1039 1035 if (!rq_wrid) { 1040 - ibdev_err(ibdev, "failed to alloc RQ wrid.\n"); 1041 1036 ret = -ENOMEM; 1042 1037 goto err_sq; 1043 1038 } ··· 1130 1135 } 1131 1136 1132 1137 if (udata) { 1133 - ret = ib_copy_from_udata(ucmd, udata, 1134 - min(udata->inlen, sizeof(*ucmd))); 1135 - if (ret) { 1136 - ibdev_err(ibdev, 1137 - "failed to copy QP ucmd, ret = %d\n", ret); 1138 + ret = ib_copy_validate_udata_in_cm( 1139 + udata, *ucmd, reserved, 1140 + HNS_ROCE_CREATE_QP_MASK_CONGEST_TYPE); 1141 + if (ret) 1138 1142 return ret; 1139 - } 1140 1143 1141 1144 uctx = rdma_udata_to_drv_context(udata, struct hns_roce_ucontext, 1142 1145 ibucontext);

+16 -38

drivers/infiniband/hw/hns/hns_roce_srq.c

··· 340 340 } 341 341 342 342 static int alloc_srq_buf(struct hns_roce_dev *hr_dev, struct hns_roce_srq *srq, 343 - struct ib_udata *udata) 343 + struct ib_udata *udata, 344 + struct hns_roce_ib_create_srq *ucmd) 344 345 { 345 - struct hns_roce_ib_create_srq ucmd = {}; 346 346 int ret; 347 347 348 - if (udata) { 349 - ret = ib_copy_from_udata(&ucmd, udata, 350 - min(udata->inlen, sizeof(ucmd))); 351 - if (ret) { 352 - ibdev_err(&hr_dev->ib_dev, 353 - "failed to copy SRQ udata, ret = %d.\n", 354 - ret); 355 - return ret; 356 - } 357 - } 358 - 359 - ret = alloc_srq_idx(hr_dev, srq, udata, ucmd.que_addr); 348 + ret = alloc_srq_idx(hr_dev, srq, udata, ucmd->que_addr); 360 349 if (ret) 361 350 return ret; 362 351 363 - ret = alloc_srq_wqe_buf(hr_dev, srq, udata, ucmd.buf_addr); 352 + ret = alloc_srq_wqe_buf(hr_dev, srq, udata, ucmd->buf_addr); 364 353 if (ret) 365 354 goto err_idx; 366 355 ··· 376 387 free_srq_idx(hr_dev, srq); 377 388 } 378 389 379 - static int get_srq_ucmd(struct hns_roce_srq *srq, struct ib_udata *udata, 380 - struct hns_roce_ib_create_srq *ucmd) 381 - { 382 - struct ib_device *ibdev = srq->ibsrq.device; 383 - int ret; 384 - 385 - ret = ib_copy_from_udata(ucmd, udata, min(udata->inlen, sizeof(*ucmd))); 386 - if (ret) { 387 - ibdev_err(ibdev, "failed to copy SRQ udata, ret = %d.\n", ret); 388 - return ret; 389 - } 390 - 391 - return 0; 392 - } 393 390 394 391 static void free_srq_db(struct hns_roce_dev *hr_dev, struct hns_roce_srq *srq, 395 392 struct ib_udata *udata) ··· 398 423 399 424 static int alloc_srq_db(struct hns_roce_dev *hr_dev, struct hns_roce_srq *srq, 400 425 struct ib_udata *udata, 426 + struct hns_roce_ib_create_srq *ucmd, 401 427 struct hns_roce_ib_create_srq_resp *resp) 402 428 { 403 - struct hns_roce_ib_create_srq ucmd = {}; 404 429 struct hns_roce_ucontext *uctx; 405 430 int ret; 406 431 407 432 if (udata) { 408 - ret = get_srq_ucmd(srq, udata, &ucmd); 409 - if (ret) 410 - return ret; 411 - 412 433 if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_SRQ_RECORD_DB) && 413 - (ucmd.req_cap_flags & HNS_ROCE_SRQ_CAP_RECORD_DB)) { 434 + (ucmd->req_cap_flags & HNS_ROCE_SRQ_CAP_RECORD_DB)) { 414 435 uctx = rdma_udata_to_drv_context(udata, 415 436 struct hns_roce_ucontext, ibucontext); 416 - ret = hns_roce_db_map_user(uctx, ucmd.db_addr, 437 + ret = hns_roce_db_map_user(uctx, ucmd->db_addr, 417 438 &srq->rdb); 418 439 if (ret) 419 440 return ret; ··· 438 467 struct hns_roce_dev *hr_dev = to_hr_dev(ib_srq->device); 439 468 struct hns_roce_ib_create_srq_resp resp = {}; 440 469 struct hns_roce_srq *srq = to_hr_srq(ib_srq); 470 + struct hns_roce_ib_create_srq ucmd = {}; 441 471 int ret; 442 472 443 473 mutex_init(&srq->mutex); ··· 448 476 if (ret) 449 477 goto err_out; 450 478 451 - ret = alloc_srq_buf(hr_dev, srq, udata); 479 + if (udata) { 480 + ret = ib_copy_validate_udata_in(udata, ucmd, que_addr); 481 + if (ret) 482 + goto err_out; 483 + } 484 + 485 + ret = alloc_srq_buf(hr_dev, srq, udata, &ucmd); 452 486 if (ret) 453 487 goto err_out; 454 488 455 - ret = alloc_srq_db(hr_dev, srq, udata, &resp); 489 + ret = alloc_srq_db(hr_dev, srq, udata, &ucmd, &resp); 456 490 if (ret) 457 491 goto err_srq_buf; 458 492

+3 -3

drivers/infiniband/hw/ionic/ionic_controlpath.c

··· 373 373 phys_addr_t db_phys = 0; 374 374 int rc; 375 375 376 - rc = ib_copy_from_udata(&req, udata, sizeof(req)); 376 + rc = ib_copy_validate_udata_in(udata, req, rsvd); 377 377 if (rc) 378 378 return rc; 379 379 ··· 1225 1225 int udma_idx = 0, rc; 1226 1226 1227 1227 if (udata) { 1228 - rc = ib_copy_from_udata(&req, udata, sizeof(req)); 1228 + rc = ib_copy_validate_udata_in(udata, req, rsvd); 1229 1229 if (rc) 1230 1230 return rc; 1231 1231 } ··· 2154 2154 int rc; 2155 2155 2156 2156 if (udata) { 2157 - rc = ib_copy_from_udata(&req, udata, sizeof(req)); 2157 + rc = ib_copy_validate_udata_in(udata, req, rsvd); 2158 2158 if (rc) 2159 2159 return rc; 2160 2160 } else {

+1 -1

drivers/infiniband/hw/ionic/ionic_ibdev.c

··· 185 185 struct ionic_ibdev *dev = 186 186 rdma_device_to_drv_device(device, struct ionic_ibdev, ibdev); 187 187 188 - return sysfs_emit(buf, "%s\n", dev->ibdev.node_desc); 188 + return sysfs_emit(buf, "%s.64\n", dev->ibdev.node_desc); 189 189 } 190 190 static DEVICE_ATTR_RO(hca_type); 191 191

+1 -1

drivers/infiniband/hw/ionic/ionic_ibdev.h

··· 4 4 #ifndef _IONIC_IBDEV_H_ 5 5 #define _IONIC_IBDEV_H_ 6 6 7 - #include <rdma/ib_umem.h> 8 7 #include <rdma/ib_verbs.h> 9 8 #include <rdma/ib_pack.h> 9 + #include <rdma/iter.h> 10 10 #include <rdma/uverbs_ioctl.h> 11 11 12 12 #include <rdma/ionic-abi.h>

+43 -1

drivers/infiniband/hw/irdma/ctrl.c

··· 3570 3570 hmc_fpm_misc->loc_mem_pages = (u32)FIELD_GET(IRDMA_QUERY_FPM_LOC_MEM_PAGES, temp); 3571 3571 if (!hmc_fpm_misc->loc_mem_pages) 3572 3572 return -EINVAL; 3573 + 3574 + get_64bit_val(buf, 184, &temp); 3575 + if (temp) { 3576 + hmc_fpm_misc->fw_scratch_buf0.size = temp; 3577 + hmc_fpm_misc->fw_scratch_buf0.va = 3578 + dma_alloc_coherent(dev->hw->device, 3579 + hmc_fpm_misc->fw_scratch_buf0.size, 3580 + &hmc_fpm_misc->fw_scratch_buf0.pa, 3581 + GFP_KERNEL); 3582 + 3583 + if (!hmc_fpm_misc->fw_scratch_buf0.va) { 3584 + hmc_fpm_misc->fw_scratch_buf0.size = 0; 3585 + return -ENOMEM; 3586 + } 3587 + } 3588 + get_64bit_val(buf, 192, &temp); 3589 + if (temp) { 3590 + hmc_fpm_misc->fw_scratch_buf1.size = temp; 3591 + hmc_fpm_misc->fw_scratch_buf1.va = 3592 + dma_alloc_coherent(dev->hw->device, 3593 + hmc_fpm_misc->fw_scratch_buf1.size, 3594 + &hmc_fpm_misc->fw_scratch_buf1.pa, 3595 + GFP_KERNEL); 3596 + 3597 + if (!hmc_fpm_misc->fw_scratch_buf1.va) { 3598 + hmc_fpm_misc->fw_scratch_buf1.size = 0; 3599 + dma_free_coherent(dev->hw->device, 3600 + hmc_fpm_misc->fw_scratch_buf0.size, 3601 + hmc_fpm_misc->fw_scratch_buf0.va, 3602 + hmc_fpm_misc->fw_scratch_buf0.pa); 3603 + hmc_fpm_misc->fw_scratch_buf0.va = NULL; 3604 + hmc_fpm_misc->fw_scratch_buf0.size = 0; 3605 + return -ENOMEM; 3606 + } 3607 + } 3573 3608 } 3574 3609 3575 3610 return 0; ··· 4222 4187 4223 4188 hdr = FIELD_PREP(IRDMA_CQPSQ_BUFSIZE, IRDMA_COMMIT_FPM_BUF_SIZE) | 4224 4189 FIELD_PREP(IRDMA_CQPSQ_OPCODE, IRDMA_CQP_OP_COMMIT_FPM_VAL) | 4190 + FIELD_PREP(IRDMA_CQPSQ_CFPM_FW_SCRATCH_BUF_PRESENT, 4191 + cqp->dev->hmc_fpm_misc.fw_scratch_buf0.va != NULL) | 4225 4192 FIELD_PREP(IRDMA_CQPSQ_WQEVALID, cqp->polarity); 4226 4193 4227 4194 dma_wmb(); /* make sure WQE is written before valid bit is set */ ··· 5071 5034 5072 5035 for (offset = 0; offset < IRDMA_COMMIT_FPM_BUF_SIZE; 5073 5036 offset += sizeof(__le64)) { 5074 - if (offset == IRDMA_PBLE_COMMIT_OFFSET) 5037 + if (offset == IRDMA_PBLE_COMMIT_OFFSET || 5038 + offset == IRDMA_SCRATCH_BUF0_COMMIT_OFFSET || 5039 + offset == IRDMA_SCRATCH_BUF1_COMMIT_OFFSET) 5075 5040 continue; 5076 5041 get_64bit_val(buf, offset, &temp); 5077 5042 if (temp) ··· 5129 5090 (u64)obj_info[IRDMA_HMC_IW_OOISC].cnt); 5130 5091 set_64bit_val(buf, 168, 5131 5092 (u64)obj_info[IRDMA_HMC_IW_OOISCFFL].cnt); 5093 + set_64bit_val(buf, 192, dev->hmc_fpm_misc.fw_scratch_buf0.pa); 5094 + set_64bit_val(buf, 200, dev->hmc_fpm_misc.fw_scratch_buf1.pa); 5132 5095 if (dev->hw_attrs.uk_attrs.hw_rev >= IRDMA_GEN_3 && 5133 5096 dev->hmc_fpm_misc.loc_mem_pages) 5134 5097 irdma_set_loc_mem(buf); ··· 6465 6424 icrdma_init_hw(dev); 6466 6425 break; 6467 6426 case IRDMA_GEN_3: 6427 + case IRDMA_GEN_4: 6468 6428 ig3rdma_init_hw(dev); 6469 6429 break; 6470 6430 }

+4

drivers/infiniband/hw/irdma/defs.h

··· 133 133 #define MAX_MR_PER_SD 0x8000 134 134 #define MAX_MR_SD_PER_FCN 0x80 135 135 #define IRDMA_PBLE_COMMIT_OFFSET 112 136 + #define IRDMA_SCRATCH_BUF0_COMMIT_OFFSET 192 137 + #define IRDMA_SCRATCH_BUF1_COMMIT_OFFSET 200 136 138 #define IRDMA_MAX_QUANTA_PER_WR 8 137 139 138 140 #define IRDMA_QP_SW_MAX_WQ_QUANTA 32768 ··· 660 658 #define IRDMA_COMMIT_FPM_QPCNT GENMASK_ULL(20, 0) 661 659 #define IRDMA_COMMIT_FPM_BASE_S 32 662 660 #define IRDMA_CQPSQ_CFPM_HMCFNID GENMASK_ULL(15, 0) 661 + #define IRDMA_CQPSQ_CFPM_FW_SCRATCH_BUF_PRESENT_S 38 662 + #define IRDMA_CQPSQ_CFPM_FW_SCRATCH_BUF_PRESENT BIT_ULL(38) 663 663 664 664 #define IRDMA_CQPSQ_FWQE_AECODE GENMASK_ULL(15, 0) 665 665 #define IRDMA_CQPSQ_FWQE_AESOURCE GENMASK_ULL(19, 16)

+23 -6

drivers/infiniband/hw/irdma/hw.c

··· 1033 1033 if (!cqp->cqp_requests) 1034 1034 return -ENOMEM; 1035 1035 1036 - cqp->scratch_array = kcalloc(sqsize, sizeof(*cqp->scratch_array), GFP_KERNEL); 1036 + cqp->scratch_array = kzalloc_objs(*cqp->scratch_array, sqsize); 1037 1037 if (!cqp->scratch_array) { 1038 1038 status = -ENOMEM; 1039 1039 goto err_scratch; ··· 1082 1082 cqp_init_info.hw_maj_ver = IRDMA_CQPHC_HW_MAJVER_GEN_2; 1083 1083 break; 1084 1084 case IRDMA_GEN_3: 1085 + case IRDMA_GEN_4: 1085 1086 cqp_init_info.hw_maj_ver = IRDMA_CQPHC_HW_MAJVER_GEN_3; 1086 1087 cqp_init_info.ts_override = 1; 1087 1088 break; ··· 1509 1508 hmc_info->hmc_obj[IRDMA_HMC_IW_CQ].cnt; 1510 1509 aeq_size = min(aeq_size, dev->hw_attrs.max_hw_aeq_size); 1511 1510 /* GEN_3 does not support virtual AEQ. Cap at max Kernel alloc size */ 1512 - if (rf->rdma_ver == IRDMA_GEN_3) 1511 + if (rf->rdma_ver >= IRDMA_GEN_3) 1513 1512 aeq_size = min(aeq_size, (u32)((PAGE_SIZE << MAX_PAGE_ORDER) / 1514 1513 sizeof(struct irdma_sc_aeqe))); 1515 1514 aeq->mem.size = ALIGN(sizeof(struct irdma_sc_aeqe) * aeq_size, ··· 1519 1518 GFP_KERNEL | __GFP_NOWARN); 1520 1519 if (aeq->mem.va) 1521 1520 goto skip_virt_aeq; 1522 - else if (rf->rdma_ver == IRDMA_GEN_3) 1521 + else if (rf->rdma_ver >= IRDMA_GEN_3) 1523 1522 return -ENOMEM; 1524 1523 1525 1524 /* physically mapped aeq failed. setup virtual aeq */ ··· 1694 1693 static void irdma_del_init_mem(struct irdma_pci_f *rf) 1695 1694 { 1696 1695 struct irdma_sc_dev *dev = &rf->sc_dev; 1696 + struct irdma_dma_mem *fw_scratch_buf0; 1697 + struct irdma_dma_mem *fw_scratch_buf1; 1697 1698 1698 1699 if (!rf->sc_dev.privileged) 1699 1700 irdma_vchnl_req_put_hmc_fcn(&rf->sc_dev); ··· 1716 1713 rf->iw_msixtbl = NULL; 1717 1714 kfree(rf->hmc_info_mem); 1718 1715 rf->hmc_info_mem = NULL; 1716 + 1717 + fw_scratch_buf0 = &dev->hmc_fpm_misc.fw_scratch_buf0; 1718 + fw_scratch_buf1 = &dev->hmc_fpm_misc.fw_scratch_buf1; 1719 + if (fw_scratch_buf0->va) 1720 + dma_free_coherent(dev->hw->device, fw_scratch_buf0->size, 1721 + fw_scratch_buf0->va, fw_scratch_buf0->pa); 1722 + if (fw_scratch_buf1->va) 1723 + dma_free_coherent(dev->hw->device, fw_scratch_buf1->size, 1724 + fw_scratch_buf1->va, fw_scratch_buf1->pa); 1719 1725 } 1720 1726 1721 1727 /** ··· 1954 1942 if (status) 1955 1943 return status; 1956 1944 1957 - stats_info.pestat = kzalloc(sizeof(*stats_info.pestat), GFP_KERNEL); 1945 + stats_info.pestat = kzalloc_obj(*stats_info.pestat); 1958 1946 if (!stats_info.pestat) { 1959 1947 irdma_cleanup_cm_core(&iwdev->cm_core); 1960 1948 return -ENOMEM; ··· 2193 2181 set_bit(2, rf->allocated_pds); 2194 2182 2195 2183 INIT_LIST_HEAD(&rf->mc_qht_list.list); 2196 - /* stag index mask has a minimum of 14 bits */ 2197 - mrdrvbits = 24 - max(get_count_order(rf->max_mr), 14); 2184 + 2185 + if (rf->rdma_ver >= IRDMA_GEN_4) 2186 + mrdrvbits = 24 - max(get_count_order(rf->max_mr), 16); 2187 + else 2188 + /* stag index mask has a minimum of 14 bits */ 2189 + mrdrvbits = 24 - max(get_count_order(rf->max_mr), 14); 2190 + 2198 2191 rf->mr_stagmask = ~(((1 << mrdrvbits) - 1) << (32 - mrdrvbits)); 2199 2192 2200 2193 return 0;

-1

drivers/infiniband/hw/irdma/ig3rdma_hw.c

··· 113 113 dev->irq_ops = &ig3rdma_irq_ops; 114 114 dev->hw_stats_map = ig3rdma_hw_stat_map; 115 115 116 - dev->hw_attrs.uk_attrs.hw_rev = IRDMA_GEN_3; 117 116 dev->hw_attrs.uk_attrs.max_hw_wq_frags = IG3RDMA_MAX_WQ_FRAGMENT_COUNT; 118 117 dev->hw_attrs.uk_attrs.max_hw_read_sges = IG3RDMA_MAX_SGE_RD; 119 118 dev->hw_attrs.uk_attrs.max_hw_sq_chunk = IRDMA_MAX_QUANTA_PER_WR;

+1

drivers/infiniband/hw/irdma/irdma.h

··· 119 119 IRDMA_GEN_1, 120 120 IRDMA_GEN_2, 121 121 IRDMA_GEN_3, 122 + IRDMA_GEN_4, 122 123 IRDMA_GEN_NEXT, 123 124 IRDMA_GEN_MAX = IRDMA_GEN_NEXT-1 124 125 };

+1 -1

drivers/infiniband/hw/irdma/main.h

··· 37 37 #include <rdma/rdma_cm.h> 38 38 #include <rdma/iw_cm.h> 39 39 #include <rdma/ib_user_verbs.h> 40 - #include <rdma/ib_umem.h> 41 40 #include <rdma/ib_cache.h> 41 + #include <rdma/iter.h> 42 42 #include <rdma/uverbs_ioctl.h> 43 43 #include "osdep.h" 44 44 #include "defs.h"

+2

drivers/infiniband/hw/irdma/type.h

··· 622 622 u32 timer_bucket; 623 623 u32 rrf_block_size; 624 624 u32 ooiscf_block_size; 625 + struct irdma_dma_mem fw_scratch_buf0; 626 + struct irdma_dma_mem fw_scratch_buf1; 625 627 }; 626 628 627 629 #define IRDMA_VCHNL_MAX_MSG_SIZE 512

+2 -2

drivers/infiniband/hw/irdma/user.h

··· 159 159 IRDMA_CEQE_SIZE = 1, 160 160 IRDMA_CQP_CTX_SIZE = 8, 161 161 IRDMA_SHADOW_AREA_SIZE = 8, 162 - IRDMA_QUERY_FPM_BUF_SIZE = 192, 163 - IRDMA_COMMIT_FPM_BUF_SIZE = 192, 162 + IRDMA_QUERY_FPM_BUF_SIZE = 200, 163 + IRDMA_COMMIT_FPM_BUF_SIZE = 208, 164 164 IRDMA_GATHER_STATS_BUF_SIZE = 1024, 165 165 IRDMA_MIN_IW_QP_ID = 0, 166 166 IRDMA_MAX_IW_QP_ID = 262143,

+102 -19

drivers/infiniband/hw/irdma/verbs.c

··· 284 284 static int irdma_alloc_ucontext(struct ib_ucontext *uctx, 285 285 struct ib_udata *udata) 286 286 { 287 - #define IRDMA_ALLOC_UCTX_MIN_REQ_LEN offsetofend(struct irdma_alloc_ucontext_req, rsvd8) 288 287 #define IRDMA_ALLOC_UCTX_MIN_RESP_LEN offsetofend(struct irdma_alloc_ucontext_resp, rsvd) 289 288 struct ib_device *ibdev = uctx->device; 290 289 struct irdma_device *iwdev = to_iwdev(ibdev); ··· 291 292 struct irdma_alloc_ucontext_resp uresp = {}; 292 293 struct irdma_ucontext *ucontext = to_ucontext(uctx); 293 294 struct irdma_uk_attrs *uk_attrs = &iwdev->rf->sc_dev.hw_attrs.uk_attrs; 295 + int ret; 294 296 295 - if (udata->inlen < IRDMA_ALLOC_UCTX_MIN_REQ_LEN || 296 - udata->outlen < IRDMA_ALLOC_UCTX_MIN_RESP_LEN) 297 + if (udata->outlen < IRDMA_ALLOC_UCTX_MIN_RESP_LEN) 297 298 return -EINVAL; 298 299 299 - if (ib_copy_from_udata(&req, udata, min(sizeof(req), udata->inlen))) 300 - return -EINVAL; 300 + ret = ib_copy_validate_udata_in_cm(udata, req, rsvd8, 301 + IRDMA_ALLOC_UCTX_USE_RAW_ATTR | 302 + IRDMA_SUPPORT_WQE_FORMAT_V2); 303 + if (ret) 304 + return ret; 301 305 302 306 if (req.userspace_ver < 4 || req.userspace_ver > IRDMA_ABI_VER) 303 307 goto ver_error; ··· 2015 2013 * @entries: desired cq size 2016 2014 * @udata: user data 2017 2015 */ 2018 - static int irdma_resize_cq(struct ib_cq *ibcq, int entries, 2016 + static int irdma_resize_cq(struct ib_cq *ibcq, unsigned int entries, 2019 2017 struct ib_udata *udata) 2020 2018 { 2021 2019 #define IRDMA_RESIZE_CQ_MIN_REQ_LEN offsetofend(struct irdma_resize_cq_req, user_cq_buffer) ··· 3593 3591 return ERR_PTR(err); 3594 3592 } 3595 3593 3594 + static int irdma_hwdereg_mr(struct ib_mr *ib_mr); 3595 + 3596 + static void irdma_umem_dmabuf_revoke(void *priv) 3597 + { 3598 + /* priv is guaranteed to be valid any time this callback is invoked 3599 + * because we do not set the callback until after successful iwmr 3600 + * allocation and initialization. 3601 + */ 3602 + struct irdma_mr *iwmr = priv; 3603 + int err; 3604 + 3605 + /* Invalidate the key in hardware. This does not actually release the 3606 + * key for potential reuse - that only occurs when the region is fully 3607 + * deregistered. 3608 + * 3609 + * The irdma_hwdereg_mr call is a no-op if the region is not currently 3610 + * registered with hardware. 3611 + */ 3612 + err = irdma_hwdereg_mr(&iwmr->ibmr); 3613 + if (err) { 3614 + struct irdma_device *iwdev = to_iwdev(iwmr->ibmr.device); 3615 + 3616 + ibdev_err(&iwdev->ibdev, "dmabuf mr revoke failed %d", err); 3617 + if (!iwdev->rf->reset) { 3618 + iwdev->rf->reset = true; 3619 + iwdev->rf->gen_ops.request_reset(iwdev->rf); 3620 + } 3621 + } 3622 + } 3623 + 3596 3624 static struct ib_mr *irdma_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start, 3597 3625 u64 len, u64 virt, 3598 3626 int fd, int access, ··· 3640 3608 if (len > iwdev->rf->sc_dev.hw_attrs.max_mr_size) 3641 3609 return ERR_PTR(-EINVAL); 3642 3610 3643 - umem_dmabuf = ib_umem_dmabuf_get_pinned(pd->device, start, len, fd, access); 3611 + umem_dmabuf = 3612 + ib_umem_dmabuf_get_pinned_revocable_and_lock(pd->device, start, 3613 + len, fd, access); 3644 3614 if (IS_ERR(umem_dmabuf)) { 3645 3615 ibdev_dbg(&iwdev->ibdev, "Failed to get dmabuf umem[%pe]\n", 3646 3616 umem_dmabuf); ··· 3659 3625 if (err) 3660 3626 goto err_iwmr; 3661 3627 3628 + ib_umem_dmabuf_set_revoke_locked(umem_dmabuf, irdma_umem_dmabuf_revoke, 3629 + iwmr); 3630 + ib_umem_dmabuf_revoke_unlock(umem_dmabuf); 3662 3631 return &iwmr->ibmr; 3663 3632 3664 3633 err_iwmr: 3665 3634 irdma_free_iwmr(iwmr); 3666 3635 3667 3636 err_release: 3637 + ib_umem_dmabuf_revoke_unlock(umem_dmabuf); 3638 + 3639 + /* Will result in a call to revoke, but driver callback is not set and 3640 + * is therefore skipped. 3641 + */ 3668 3642 ib_umem_release(&umem_dmabuf->umem); 3669 3643 3670 3644 return ERR_PTR(err); ··· 3793 3751 struct irdma_device *iwdev = to_iwdev(ib_mr->device); 3794 3752 struct irdma_mr *iwmr = to_iwmr(ib_mr); 3795 3753 struct irdma_pbl *iwpbl = &iwmr->iwpbl; 3754 + bool dmabuf_revocable = iwmr->region && iwmr->region->is_dmabuf; 3755 + struct ib_umem_dmabuf *umem_dmabuf; 3796 3756 int ret; 3797 3757 3798 3758 if (len > iwdev->rf->sc_dev.hw_attrs.max_mr_size) ··· 3803 3759 if (flags & ~(IB_MR_REREG_TRANS | IB_MR_REREG_PD | IB_MR_REREG_ACCESS)) 3804 3760 return ERR_PTR(-EOPNOTSUPP); 3805 3761 3762 + if (dmabuf_revocable) { 3763 + umem_dmabuf = to_ib_umem_dmabuf(iwmr->region); 3764 + 3765 + ib_umem_dmabuf_revoke_lock(umem_dmabuf); 3766 + 3767 + /* If the dmabuf has been revoked, it means that the region has 3768 + * been invalidated in HW. We must not allow it to become valid 3769 + * again unless the user is requesting a change in translation 3770 + * which will end up dropping the umem dmabuf and allocating an 3771 + * entirely new umem anyway. 3772 + */ 3773 + if (umem_dmabuf->revoked && !(flags & IB_MR_REREG_TRANS)) { 3774 + ret = -EINVAL; 3775 + goto err_unlock; 3776 + } 3777 + } 3778 + 3806 3779 ret = irdma_hwdereg_mr(ib_mr); 3807 3780 if (ret) 3808 - return ERR_PTR(ret); 3781 + goto err_unlock; 3809 3782 3810 3783 if (flags & IB_MR_REREG_ACCESS) 3811 3784 iwmr->access = new_access; ··· 3838 3777 &iwpbl->pble_alloc); 3839 3778 iwpbl->pbl_allocated = false; 3840 3779 } 3780 + 3781 + if (dmabuf_revocable) { 3782 + /* Must unlock before release to prevent deadlock */ 3783 + ib_umem_dmabuf_revoke_unlock(umem_dmabuf); 3784 + dmabuf_revocable = false; 3785 + } 3786 + 3841 3787 if (iwmr->region) { 3842 3788 ib_umem_release(iwmr->region); 3843 3789 iwmr->region = NULL; 3844 3790 } 3845 3791 3846 3792 ret = irdma_rereg_mr_trans(iwmr, start, len, virt); 3847 - } else 3793 + } else { 3848 3794 ret = irdma_hwreg_mr(iwdev, iwmr, iwmr->access); 3849 - if (ret) 3850 - return ERR_PTR(ret); 3795 + } 3851 3796 3852 - return NULL; 3797 + err_unlock: 3798 + if (dmabuf_revocable) 3799 + ib_umem_dmabuf_revoke_unlock(umem_dmabuf); 3800 + 3801 + return ret ? ERR_PTR(ret) : NULL; 3853 3802 } 3854 3803 3855 3804 /** ··· 3982 3911 struct irdma_mr *iwmr = to_iwmr(ib_mr); 3983 3912 struct irdma_device *iwdev = to_iwdev(ib_mr->device); 3984 3913 struct irdma_pbl *iwpbl = &iwmr->iwpbl; 3914 + bool dmabuf_revocable = iwmr->region && iwmr->region->is_dmabuf; 3985 3915 int ret; 3986 3916 3987 3917 if (iwmr->type != IRDMA_MEMREG_TYPE_MEM) { ··· 3997 3925 goto done; 3998 3926 } 3999 3927 4000 - ret = irdma_hwdereg_mr(ib_mr); 4001 - if (ret) 4002 - return ret; 3928 + if (!dmabuf_revocable) { 3929 + ret = irdma_hwdereg_mr(ib_mr); 3930 + if (ret) 3931 + return ret; 4003 3932 4004 - irdma_free_stag(iwdev, iwmr->stag); 3933 + irdma_free_stag(iwdev, iwmr->stag); 3934 + } 4005 3935 done: 3936 + if (iwmr->region) 3937 + /* For dmabuf MRs, ib_umem_release will trigger a synchronous 3938 + * call to the revoke callback which will perform the actual HW 3939 + * invalidation via irdma_hwdereg_mr. We rely on this for its 3940 + * implicit serialization w.r.t. concurrent revocations. This 3941 + * must be done before freeing the PBLEs. 3942 + */ 3943 + ib_umem_release(iwmr->region); 3944 + 4006 3945 if (iwpbl->pbl_allocated) 4007 3946 irdma_free_pble(iwdev->rf->pble_rsrc, &iwpbl->pble_alloc); 4008 3947 4009 - if (iwmr->region) 4010 - ib_umem_release(iwmr->region); 3948 + if (dmabuf_revocable) 3949 + irdma_free_stag(iwdev, iwmr->stag); 4011 3950 4012 3951 kfree(iwmr); 4013 3952 ··· 5465 5382 .reg_user_mr_dmabuf = irdma_reg_user_mr_dmabuf, 5466 5383 .rereg_user_mr = irdma_rereg_user_mr, 5467 5384 .req_notify_cq = irdma_req_notify_cq, 5468 - .resize_cq = irdma_resize_cq, 5385 + .resize_user_cq = irdma_resize_cq, 5469 5386 INIT_RDMA_OBJ_SIZE(ib_pd, irdma_pd, ibpd), 5470 5387 INIT_RDMA_OBJ_SIZE(ib_ucontext, irdma_ucontext, ibucontext), 5471 5388 INIT_RDMA_OBJ_SIZE(ib_ah, irdma_ah, ibah),

+3 -8

drivers/infiniband/hw/mana/cq.c

··· 13 13 struct mana_ib_create_cq_resp resp = {}; 14 14 struct mana_ib_ucontext *mana_ucontext; 15 15 struct ib_device *ibdev = ibcq->device; 16 - struct mana_ib_create_cq ucmd = {}; 16 + struct mana_ib_create_cq ucmd; 17 17 struct mana_ib_dev *mdev; 18 18 bool is_rnic_cq; 19 19 u32 doorbell; ··· 27 27 is_rnic_cq = mana_ib_is_rnic(mdev); 28 28 29 29 if (udata) { 30 - if (udata->inlen < offsetof(struct mana_ib_create_cq, flags)) 31 - return -EINVAL; 32 - 33 - err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen)); 34 - if (err) { 35 - ibdev_dbg(ibdev, "Failed to copy from udata for create cq, %d\n", err); 30 + err = ib_copy_validate_udata_in(udata, ucmd, buf_addr); 31 + if (err) 36 32 return err; 37 - } 38 33 39 34 if ((!is_rnic_cq && attr->cqe > mdev->adapter_caps.max_qp_wr) || 40 35 attr->cqe > U32_MAX / COMP_ENTRY_SIZE) {

+3

drivers/infiniband/hw/mana/device.c

··· 17 17 .uverbs_abi_ver = MANA_IB_UVERBS_ABI_VERSION, 18 18 19 19 .add_gid = mana_ib_gd_add_gid, 20 + .alloc_mw = mana_ib_alloc_mw, 20 21 .alloc_pd = mana_ib_alloc_pd, 21 22 .alloc_ucontext = mana_ib_alloc_ucontext, 22 23 .create_ah = mana_ib_create_ah, ··· 25 24 .create_qp = mana_ib_create_qp, 26 25 .create_rwq_ind_table = mana_ib_create_rwq_ind_table, 27 26 .create_wq = mana_ib_create_wq, 27 + .dealloc_mw = mana_ib_dealloc_mw, 28 28 .dealloc_pd = mana_ib_dealloc_pd, 29 29 .dealloc_ucontext = mana_ib_dealloc_ucontext, 30 30 .del_gid = mana_ib_gd_del_gid, ··· 55 53 56 54 INIT_RDMA_OBJ_SIZE(ib_ah, mana_ib_ah, ibah), 57 55 INIT_RDMA_OBJ_SIZE(ib_cq, mana_ib_cq, ibcq), 56 + INIT_RDMA_OBJ_SIZE(ib_mw, mana_ib_mw, ibmw), 58 57 INIT_RDMA_OBJ_SIZE(ib_pd, mana_ib_pd, ibpd), 59 58 INIT_RDMA_OBJ_SIZE(ib_qp, mana_ib_qp, ibqp), 60 59 INIT_RDMA_OBJ_SIZE(ib_ucontext, mana_ib_ucontext, ibucontext),

+26 -115

drivers/infiniband/hw/mana/main.c

··· 87 87 flags |= GDMA_PD_FLAG_ALLOW_GPA_MR; 88 88 89 89 req.flags = flags; 90 - err = mana_gd_send_request(gc, sizeof(req), &req, 91 - sizeof(resp), &resp); 92 - 93 - if (err || resp.hdr.status) { 94 - ibdev_dbg(&dev->ib_dev, 95 - "Failed to get pd_id err %d status %u\n", err, 96 - resp.hdr.status); 97 - if (!err) 98 - err = -EPROTO; 99 - 90 + err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 91 + if (err) 100 92 return err; 101 - } 102 93 103 94 pd->pd_handle = resp.pd_handle; 104 95 pd->pdn = resp.pd_id; ··· 109 118 struct gdma_destroy_pd_req req = {}; 110 119 struct mana_ib_dev *dev; 111 120 struct gdma_context *gc; 112 - int err; 113 121 114 122 dev = container_of(ibdev, struct mana_ib_dev, ib_dev); 115 123 gc = mdev_to_gc(dev); ··· 117 127 sizeof(resp)); 118 128 119 129 req.pd_handle = pd->pd_handle; 120 - err = mana_gd_send_request(gc, sizeof(req), &req, 121 - sizeof(resp), &resp); 122 130 123 - if (err || resp.hdr.status) { 124 - ibdev_dbg(&dev->ib_dev, 125 - "Failed to destroy pd_handle 0x%llx err %d status %u", 126 - pd->pd_handle, err, resp.hdr.status); 127 - if (!err) 128 - err = -EPROTO; 129 - } 130 - 131 - return err; 131 + return mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 132 132 } 133 133 134 134 static int mana_gd_destroy_doorbell_page(struct gdma_context *gc, ··· 126 146 { 127 147 struct gdma_destroy_resource_range_req req = {}; 128 148 struct gdma_resp_hdr resp = {}; 129 - int err; 130 149 131 150 mana_gd_init_req_hdr(&req.hdr, GDMA_DESTROY_RESOURCE_RANGE, 132 151 sizeof(req), sizeof(resp)); ··· 134 155 req.num_resources = 1; 135 156 req.allocated_resources = doorbell_page; 136 157 137 - err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 138 - if (err || resp.status) { 139 - dev_err(gc->dev, 140 - "Failed to destroy doorbell page: ret %d, 0x%x\n", 141 - err, resp.status); 142 - return err ?: -EPROTO; 143 - } 144 - 145 - return 0; 158 + return mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 146 159 } 147 160 148 161 static int mana_gd_allocate_doorbell_page(struct gdma_context *gc, ··· 155 184 req.allocated_resources = 0; 156 185 157 186 err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 158 - if (err || resp.hdr.status) { 159 - dev_err(gc->dev, 160 - "Failed to allocate doorbell page: ret %d, 0x%x\n", 161 - err, resp.hdr.status); 162 - return err ?: -EPROTO; 163 - } 187 + if (err) 188 + return err; 164 189 165 190 *doorbell_page = resp.allocated_resources; 166 191 ··· 649 682 req.hdr.resp.msg_version = GDMA_MESSAGE_V4; 650 683 req.hdr.dev_id = dev->gdma_dev->dev_id; 651 684 652 - err = mana_gd_send_request(mdev_to_gc(dev), sizeof(req), 653 - &req, sizeof(resp), &resp); 654 - 655 - if (err) { 656 - ibdev_err(&dev->ib_dev, 657 - "Failed to query adapter caps err %d", err); 685 + err = mana_gd_send_request(mdev_to_gc(dev), sizeof(req), &req, 686 + sizeof(resp), &resp); 687 + if (err) 658 688 return err; 659 - } 660 689 661 690 caps->max_sq_id = resp.max_sq_id; 662 691 caps->max_rq_id = resp.max_rq_id; ··· 690 727 mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_MAX_RESOURCES, 691 728 sizeof(req), sizeof(resp)); 692 729 693 - err = mana_gd_send_request(mdev_to_gc(dev), sizeof(req), &req, sizeof(resp), &resp); 694 - if (err) { 695 - ibdev_err(&dev->ib_dev, 696 - "Failed to query adapter caps err %d", err); 730 + err = mana_gd_send_request(mdev_to_gc(dev), sizeof(req), &req, 731 + sizeof(resp), &resp); 732 + if (err) 697 733 return err; 698 - } 699 734 700 735 caps->max_qp_count = min_t(u32, resp.max_sq, resp.max_rq); 701 736 caps->max_cq_count = resp.max_cq; ··· 808 847 req.feature_flags |= MANA_IB_FEATURE_CLIENT_ERROR_CQE_REQUEST; 809 848 810 849 err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 811 - if (err) { 812 - ibdev_err(&mdev->ib_dev, "Failed to create RNIC adapter err %d", err); 850 + if (err) 813 851 return err; 814 - } 815 852 mdev->adapter_handle = resp.adapter; 816 853 817 854 return 0; ··· 820 861 struct mana_rnic_destroy_adapter_resp resp = {}; 821 862 struct mana_rnic_destroy_adapter_req req = {}; 822 863 struct gdma_context *gc; 823 - int err; 824 864 825 865 gc = mdev_to_gc(mdev); 826 866 mana_gd_init_req_hdr(&req.hdr, MANA_IB_DESTROY_ADAPTER, sizeof(req), sizeof(resp)); 827 867 req.hdr.dev_id = mdev->gdma_dev->dev_id; 828 868 req.adapter = mdev->adapter_handle; 829 869 830 - err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 831 - if (err) { 832 - ibdev_err(&mdev->ib_dev, "Failed to destroy RNIC adapter err %d", err); 833 - return err; 834 - } 835 - 836 - return 0; 870 + return mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 837 871 } 838 872 839 873 int mana_ib_gd_add_gid(const struct ib_gid_attr *attr, void **context) ··· 836 884 struct mana_rnic_config_addr_resp resp = {}; 837 885 struct gdma_context *gc = mdev_to_gc(mdev); 838 886 struct mana_rnic_config_addr_req req = {}; 839 - int err; 840 887 841 888 if (ntype != RDMA_NETWORK_IPV4 && ntype != RDMA_NETWORK_IPV6) { 842 889 ibdev_dbg(&mdev->ib_dev, "Unsupported rdma network type %d", ntype); ··· 849 898 req.sgid_type = (ntype == RDMA_NETWORK_IPV6) ? SGID_TYPE_IPV6 : SGID_TYPE_IPV4; 850 899 copy_in_reverse(req.ip_addr, attr->gid.raw, sizeof(union ib_gid)); 851 900 852 - err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 853 - if (err) { 854 - ibdev_err(&mdev->ib_dev, "Failed to config IP addr err %d\n", err); 855 - return err; 856 - } 857 - 858 - return 0; 901 + return mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 859 902 } 860 903 861 904 int mana_ib_gd_del_gid(const struct ib_gid_attr *attr, void **context) ··· 859 914 struct mana_rnic_config_addr_resp resp = {}; 860 915 struct gdma_context *gc = mdev_to_gc(mdev); 861 916 struct mana_rnic_config_addr_req req = {}; 862 - int err; 863 917 864 918 if (ntype != RDMA_NETWORK_IPV4 && ntype != RDMA_NETWORK_IPV6) { 865 919 ibdev_dbg(&mdev->ib_dev, "Unsupported rdma network type %d", ntype); ··· 872 928 req.sgid_type = (ntype == RDMA_NETWORK_IPV6) ? SGID_TYPE_IPV6 : SGID_TYPE_IPV4; 873 929 copy_in_reverse(req.ip_addr, attr->gid.raw, sizeof(union ib_gid)); 874 930 875 - err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 876 - if (err) { 877 - ibdev_err(&mdev->ib_dev, "Failed to config IP addr err %d\n", err); 878 - return err; 879 - } 880 - 881 - return 0; 931 + return mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 882 932 } 883 933 884 934 int mana_ib_gd_config_mac(struct mana_ib_dev *mdev, enum mana_ib_addr_op op, u8 *mac) ··· 880 942 struct mana_rnic_config_mac_addr_resp resp = {}; 881 943 struct mana_rnic_config_mac_addr_req req = {}; 882 944 struct gdma_context *gc = mdev_to_gc(mdev); 883 - int err; 884 945 885 946 mana_gd_init_req_hdr(&req.hdr, MANA_IB_CONFIG_MAC_ADDR, sizeof(req), sizeof(resp)); 886 947 req.hdr.dev_id = mdev->gdma_dev->dev_id; ··· 887 950 req.op = op; 888 951 copy_in_reverse(req.mac_addr, mac, ETH_ALEN); 889 952 890 - err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 891 - if (err) { 892 - ibdev_err(&mdev->ib_dev, "Failed to config Mac addr err %d", err); 893 - return err; 894 - } 895 - 896 - return 0; 953 + return mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 897 954 } 898 955 899 956 int mana_ib_gd_create_cq(struct mana_ib_dev *mdev, struct mana_ib_cq *cq, u32 doorbell) ··· 927 996 struct gdma_context *gc = mdev_to_gc(mdev); 928 997 struct mana_rnic_destroy_cq_resp resp = {}; 929 998 struct mana_rnic_destroy_cq_req req = {}; 930 - int err; 931 999 932 1000 if (cq->cq_handle == INVALID_MANA_HANDLE) 933 1001 return 0; ··· 936 1006 req.adapter = mdev->adapter_handle; 937 1007 req.cq_handle = cq->cq_handle; 938 1008 939 - err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 940 - 941 - if (err) { 942 - ibdev_err(&mdev->ib_dev, "Failed to destroy cq err %d", err); 943 - return err; 944 - } 945 - 946 - return 0; 1009 + return mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 947 1010 } 948 1011 949 1012 int mana_ib_gd_create_rc_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp, ··· 966 1043 req.flags = flags; 967 1044 968 1045 err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 969 - if (err) { 970 - ibdev_err(&mdev->ib_dev, "Failed to create rc qp err %d", err); 1046 + if (err) 971 1047 return err; 972 - } 1048 + 973 1049 qp->qp_handle = resp.rc_qp_handle; 974 1050 for (i = 0; i < MANA_RC_QUEUE_TYPE_MAX; i++) { 975 1051 qp->rc_qp.queues[i].id = resp.queue_ids[i]; ··· 983 1061 struct mana_rnic_destroy_rc_qp_resp resp = {0}; 984 1062 struct mana_rnic_destroy_rc_qp_req req = {0}; 985 1063 struct gdma_context *gc = mdev_to_gc(mdev); 986 - int err; 987 1064 988 1065 mana_gd_init_req_hdr(&req.hdr, MANA_IB_DESTROY_RC_QP, sizeof(req), sizeof(resp)); 989 1066 req.hdr.dev_id = mdev->gdma_dev->dev_id; 990 1067 req.adapter = mdev->adapter_handle; 991 1068 req.rc_qp_handle = qp->qp_handle; 992 - err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 993 - if (err) { 994 - ibdev_err(&mdev->ib_dev, "Failed to destroy rc qp err %d", err); 995 - return err; 996 - } 997 - return 0; 1069 + 1070 + return mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 998 1071 } 999 1072 1000 1073 int mana_ib_gd_create_ud_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp, ··· 1018 1101 req.max_recv_sge = attr->cap.max_recv_sge; 1019 1102 req.qp_type = type; 1020 1103 err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 1021 - if (err) { 1022 - ibdev_err(&mdev->ib_dev, "Failed to create ud qp err %d", err); 1104 + if (err) 1023 1105 return err; 1024 - } 1106 + 1025 1107 qp->qp_handle = resp.qp_handle; 1026 1108 for (i = 0; i < MANA_UD_QUEUE_TYPE_MAX; i++) { 1027 1109 qp->ud_qp.queues[i].id = resp.queue_ids[i]; ··· 1035 1119 struct mana_rnic_destroy_udqp_resp resp = {0}; 1036 1120 struct mana_rnic_destroy_udqp_req req = {0}; 1037 1121 struct gdma_context *gc = mdev_to_gc(mdev); 1038 - int err; 1039 1122 1040 1123 mana_gd_init_req_hdr(&req.hdr, MANA_IB_DESTROY_UD_QP, sizeof(req), sizeof(resp)); 1041 1124 req.hdr.dev_id = mdev->gdma_dev->dev_id; 1042 1125 req.adapter = mdev->adapter_handle; 1043 1126 req.qp_handle = qp->qp_handle; 1044 - err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 1045 - if (err) { 1046 - ibdev_err(&mdev->ib_dev, "Failed to destroy ud qp err %d", err); 1047 - return err; 1048 - } 1049 - return 0; 1127 + 1128 + return mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 1050 1129 }

+9 -1

drivers/infiniband/hw/mana/mana_ib.h

··· 8 8 9 9 #include <rdma/ib_verbs.h> 10 10 #include <rdma/ib_mad.h> 11 - #include <rdma/ib_umem.h> 11 + #include <rdma/iter.h> 12 12 #include <rdma/mana-abi.h> 13 13 #include <rdma/uverbs_ioctl.h> 14 14 #include <linux/dmapool.h> ··· 123 123 struct ib_ah ibah; 124 124 struct mana_ib_av *av; 125 125 dma_addr_t dma_handle; 126 + }; 127 + 128 + struct mana_ib_mw { 129 + struct ib_mw ibmw; 130 + mana_handle_t mw_handle; 126 131 }; 127 132 128 133 struct mana_ib_mr { ··· 740 735 void mana_drain_gsi_sqs(struct mana_ib_dev *mdev); 741 736 int mana_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc); 742 737 int mana_ib_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags); 738 + 739 + int mana_ib_alloc_mw(struct ib_mw *mw, struct ib_udata *udata); 740 + int mana_ib_dealloc_mw(struct ib_mw *mw); 743 741 744 742 struct ib_mr *mana_ib_reg_user_mr_dmabuf(struct ib_pd *ibpd, u64 start, u64 length, 745 743 u64 iova, int fd, int mr_access_flags,

+57 -35

drivers/infiniband/hw/mana/mr.c

··· 6 6 #include "mana_ib.h" 7 7 8 8 #define VALID_MR_FLAGS (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ |\ 9 - IB_ACCESS_REMOTE_ATOMIC | IB_ZERO_BASED) 9 + IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND | IB_ZERO_BASED) 10 10 11 11 #define VALID_DMA_MR_FLAGS (IB_ACCESS_LOCAL_WRITE) 12 12 ··· 26 26 27 27 if (access_flags & IB_ACCESS_REMOTE_ATOMIC) 28 28 flags |= GDMA_ACCESS_FLAG_REMOTE_ATOMIC; 29 + 30 + if (access_flags & IB_ACCESS_MW_BIND) 31 + flags |= GDMA_ACCESS_FLAG_BIND_MW; 29 32 30 33 return flags; 31 34 } ··· 73 70 } 74 71 75 72 err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 76 - 77 - if (err || resp.hdr.status) { 78 - ibdev_dbg(&dev->ib_dev, "Failed to create mr %d, %u", err, 79 - resp.hdr.status); 80 - if (!err) 81 - err = -EPROTO; 82 - 73 + if (err) 83 74 return err; 84 - } 85 75 86 76 mr->ibmr.lkey = resp.lkey; 87 77 mr->ibmr.rkey = resp.rkey; ··· 88 92 struct gdma_destroy_mr_response resp = {}; 89 93 struct gdma_destroy_mr_request req = {}; 90 94 struct gdma_context *gc = mdev_to_gc(dev); 91 - int err; 92 95 93 96 mana_gd_init_req_hdr(&req.hdr, GDMA_DESTROY_MR, sizeof(req), 94 97 sizeof(resp)); 95 98 96 99 req.mr_handle = mr_handle; 97 100 98 - err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 99 - if (err || resp.hdr.status) { 100 - dev_err(gc->dev, "Failed to destroy MR: %d, 0x%x\n", err, 101 - resp.hdr.status); 102 - if (!err) 103 - err = -EPROTO; 104 - return err; 105 - } 106 - 107 - return 0; 101 + return mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 108 102 } 109 103 110 104 struct ib_mr *mana_ib_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 length, ··· 290 304 return ERR_PTR(err); 291 305 } 292 306 307 + static int mana_ib_gd_create_mw(struct mana_ib_dev *dev, struct mana_ib_pd *pd, struct ib_mw *ibmw) 308 + { 309 + struct mana_ib_mw *mw = container_of(ibmw, struct mana_ib_mw, ibmw); 310 + struct gdma_context *gc = mdev_to_gc(dev); 311 + struct gdma_create_mr_response resp = {}; 312 + struct gdma_create_mr_request req = {}; 313 + int err; 314 + 315 + mana_gd_init_req_hdr(&req.hdr, GDMA_CREATE_MR, sizeof(req), sizeof(resp)); 316 + req.hdr.req.msg_version = GDMA_MESSAGE_V2; 317 + req.pd_handle = pd->pd_handle; 318 + 319 + switch (mw->ibmw.type) { 320 + case IB_MW_TYPE_1: 321 + req.mr_type = GDMA_MR_TYPE_MW1; 322 + break; 323 + case IB_MW_TYPE_2: 324 + req.mr_type = GDMA_MR_TYPE_MW2; 325 + break; 326 + default: 327 + return -EINVAL; 328 + } 329 + 330 + err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 331 + if (err) 332 + return err; 333 + 334 + mw->ibmw.rkey = resp.rkey; 335 + mw->mw_handle = resp.mr_handle; 336 + 337 + return 0; 338 + } 339 + 340 + int mana_ib_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata) 341 + { 342 + struct mana_ib_dev *mdev = container_of(ibmw->device, struct mana_ib_dev, ib_dev); 343 + struct mana_ib_pd *pd = container_of(ibmw->pd, struct mana_ib_pd, ibpd); 344 + 345 + return mana_ib_gd_create_mw(mdev, pd, ibmw); 346 + } 347 + 348 + int mana_ib_dealloc_mw(struct ib_mw *ibmw) 349 + { 350 + struct mana_ib_dev *dev = container_of(ibmw->device, struct mana_ib_dev, ib_dev); 351 + struct mana_ib_mw *mw = container_of(ibmw, struct mana_ib_mw, ibmw); 352 + 353 + return mana_ib_gd_destroy_mr(dev, mw->mw_handle); 354 + } 355 + 293 356 int mana_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata) 294 357 { 295 358 struct mana_ib_mr *mr = container_of(ibmr, struct mana_ib_mr, ibmr); ··· 374 339 req.flags = attr->flags; 375 340 376 341 err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 377 - if (err || resp.hdr.status) { 378 - if (!err) 379 - err = -EPROTO; 380 - 342 + if (err) 381 343 return err; 382 - } 383 344 384 345 dm->dm_handle = resp.dm_handle; 385 346 ··· 411 380 struct gdma_context *gc = mdev_to_gc(mdev); 412 381 struct gdma_destroy_dm_resp resp = {}; 413 382 struct gdma_destroy_dm_req req = {}; 414 - int err; 415 383 416 384 mana_gd_init_req_hdr(&req.hdr, GDMA_DESTROY_DM, sizeof(req), sizeof(resp)); 417 385 req.dm_handle = dm->dm_handle; 418 386 419 - err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 420 - if (err || resp.hdr.status) { 421 - if (!err) 422 - err = -EPROTO; 423 - 424 - return err; 425 - } 426 - 427 - return 0; 387 + return mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 428 388 } 429 389 430 390 int mana_ib_dealloc_dm(struct ib_dm *ibdm, struct uverbs_attr_bundle *attrs)

+26 -43

drivers/infiniband/hw/mana/qp.c

··· 68 68 req->vport, default_rxobj); 69 69 70 70 err = mana_gd_send_request(gc, req_buf_size, req, sizeof(resp), &resp); 71 - if (err) { 72 - netdev_err(ndev, "Failed to configure vPort RX: %d\n", err); 73 - goto out; 74 - } 75 - 76 - if (resp.hdr.status) { 77 - netdev_err(ndev, "vPort RX configuration failed: 0x%x\n", 78 - resp.hdr.status); 79 - err = -EPROTO; 80 - goto out; 81 - } 82 - 83 - netdev_info(ndev, "Configured steering vPort %llu log_entries %u\n", 84 - mpc->port_handle, log_ind_tbl_size); 85 - 86 - out: 87 71 kfree(req); 88 72 return err; 89 73 } ··· 81 97 container_of(pd->device, struct mana_ib_dev, ib_dev); 82 98 struct ib_rwq_ind_table *ind_tbl = attr->rwq_ind_tbl; 83 99 struct mana_ib_create_qp_rss_resp resp = {}; 84 - struct mana_ib_create_qp_rss ucmd = {}; 100 + struct mana_ib_create_qp_rss ucmd; 85 101 mana_handle_t *mana_ind_table; 86 102 struct mana_port_context *mpc; 87 103 unsigned int ind_tbl_size; ··· 95 111 u32 port; 96 112 int ret; 97 113 98 - if (!udata || udata->inlen < sizeof(ucmd)) 114 + if (!udata) 99 115 return -EINVAL; 100 116 101 - ret = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen)); 102 - if (ret) { 103 - ibdev_dbg(&mdev->ib_dev, 104 - "Failed copy from udata for create rss-qp, err %d\n", 105 - ret); 117 + ret = ib_copy_validate_udata_in(udata, ucmd, port); 118 + if (ret) 106 119 return ret; 107 - } 108 120 109 121 if (attr->cap.max_recv_wr > mdev->adapter_caps.max_qp_wr) { 110 122 ibdev_dbg(&mdev->ib_dev, ··· 262 282 u32 port; 263 283 int err; 264 284 265 - if (!mana_ucontext || udata->inlen < sizeof(ucmd)) 285 + if (!mana_ucontext) 266 286 return -EINVAL; 267 287 268 - err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen)); 269 - if (err) { 270 - ibdev_dbg(&mdev->ib_dev, 271 - "Failed to copy from udata create qp-raw, %d\n", err); 288 + err = ib_copy_validate_udata_in(udata, ucmd, port); 289 + if (err) 272 290 return err; 273 - } 274 291 275 292 if (attr->cap.max_send_wr > mdev->adapter_caps.max_qp_wr) { 276 293 ibdev_dbg(&mdev->ib_dev, ··· 512 535 u64 flags = 0; 513 536 u32 doorbell; 514 537 515 - if (!udata || udata->inlen < sizeof(ucmd)) 538 + if (!udata) 516 539 return -EINVAL; 517 540 518 541 mana_ucontext = rdma_udata_to_drv_context(udata, struct mana_ib_ucontext, ibucontext); 519 542 doorbell = mana_ucontext->doorbell; 520 543 flags = MANA_RC_FLAG_NO_FMR; 521 - err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen)); 522 - if (err) { 523 - ibdev_dbg(&mdev->ib_dev, "Failed to copy from udata, %d\n", err); 544 + err = ib_copy_validate_udata_in(udata, ucmd, queue_size); 545 + if (err) 524 546 return err; 525 - } 526 547 527 548 for (i = 0, j = 0; i < MANA_RC_QUEUE_TYPE_MAX; ++i) { 528 549 /* skip FMR for user-level RC QPs */ ··· 706 731 struct gdma_context *gc = mdev_to_gc(mdev); 707 732 struct mana_port_context *mpc; 708 733 struct net_device *ndev; 709 - int err; 710 734 711 735 mana_gd_init_req_hdr(&req.hdr, MANA_IB_SET_QP_STATE, sizeof(req), sizeof(resp)); 712 736 ··· 758 784 req.ah_attr.flow_label = attr->ah_attr.grh.flow_label; 759 785 } 760 786 761 - err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 762 - if (err) { 763 - ibdev_err(&mdev->ib_dev, "Failed modify qp err %d", err); 764 - return err; 765 - } 766 - 767 - return 0; 787 + return mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 768 788 } 769 789 770 790 int mana_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, ··· 789 821 790 822 ndev = mana_ib_get_netdev(qp->ibqp.device, qp->port); 791 823 mpc = netdev_priv(ndev); 824 + 825 + /* Disable vPort RX steering before destroying RX WQ objects. 826 + * Otherwise firmware still routes traffic to the destroyed queues, 827 + * which can cause bogus completions on reused CQ IDs when the 828 + * ethernet driver later creates new queues on mana_open(). 829 + * 830 + * Unlike the ethernet teardown path, mana_fence_rqs() cannot be 831 + * used here because the fence completion CQE is delivered on the 832 + * CQ which is polled by userspace (e.g. DPDK), so there is no way 833 + * for the kernel to wait for fence completion. 834 + * 835 + * This is best effort — if it fails there is not much we can do, 836 + * and mana_cfg_vport_steering() already logs the error. 837 + */ 838 + mana_disable_vport_rx(mpc); 792 839 793 840 for (i = 0; i < (1 << ind_tbl->log_ind_tbl_size); i++) { 794 841 ibwq = ind_tbl->ind_tbl[i];

+3 -9

drivers/infiniband/hw/mana/wq.c

··· 11 11 { 12 12 struct mana_ib_dev *mdev = 13 13 container_of(pd->device, struct mana_ib_dev, ib_dev); 14 - struct mana_ib_create_wq ucmd = {}; 14 + struct mana_ib_create_wq ucmd; 15 15 struct mana_ib_wq *wq; 16 16 int err; 17 17 18 - if (udata->inlen < sizeof(ucmd)) 19 - return ERR_PTR(-EINVAL); 20 - 21 - err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen)); 22 - if (err) { 23 - ibdev_dbg(&mdev->ib_dev, 24 - "Failed to copy from udata for create wq, %d\n", err); 18 + err = ib_copy_validate_udata_in(udata, ucmd, reserved); 19 + if (err) 25 20 return ERR_PTR(err); 26 - } 27 21 28 22 wq = kzalloc_obj(*wq); 29 23 if (!wq)

+154 -116

drivers/infiniband/hw/mlx4/cq.c

··· 135 135 mlx4_buf_free(dev->dev, (cqe + 1) * buf->entry_size, &buf->buf); 136 136 } 137 137 138 - static int mlx4_ib_get_cq_umem(struct mlx4_ib_dev *dev, 139 - struct mlx4_ib_cq_buf *buf, 140 - struct ib_umem **umem, u64 buf_addr, int cqe) 141 - { 142 - int err; 143 - int cqe_size = dev->dev->caps.cqe_size; 144 - int shift; 145 - int n; 146 - 147 - *umem = ib_umem_get(&dev->ib_dev, buf_addr, cqe * cqe_size, 148 - IB_ACCESS_LOCAL_WRITE); 149 - if (IS_ERR(*umem)) 150 - return PTR_ERR(*umem); 151 - 152 - shift = mlx4_ib_umem_calc_optimal_mtt_size(*umem, 0, &n); 153 - if (shift < 0) { 154 - err = shift; 155 - goto err_buf; 156 - } 157 - 158 - err = mlx4_mtt_init(dev->dev, n, shift, &buf->mtt); 159 - if (err) 160 - goto err_buf; 161 - 162 - err = mlx4_ib_umem_write_mtt(dev, &buf->mtt, *umem); 163 - if (err) 164 - goto err_mtt; 165 - 166 - return 0; 167 - 168 - err_mtt: 169 - mlx4_mtt_cleanup(dev->dev, &buf->mtt); 170 - 171 - err_buf: 172 - ib_umem_release(*umem); 173 - 174 - return err; 175 - } 176 - 177 138 #define CQ_CREATE_FLAGS_SUPPORTED IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION 178 - int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 179 - struct uverbs_attr_bundle *attrs) 139 + int mlx4_ib_create_user_cq(struct ib_cq *ibcq, 140 + const struct ib_cq_init_attr *attr, 141 + struct uverbs_attr_bundle *attrs) 180 142 { 181 143 struct ib_udata *udata = &attrs->driver_udata; 182 144 struct ib_device *ibdev = ibcq->device; ··· 146 184 int vector = attr->comp_vector; 147 185 struct mlx4_ib_dev *dev = to_mdev(ibdev); 148 186 struct mlx4_ib_cq *cq = to_mcq(ibcq); 149 - struct mlx4_uar *uar; 187 + struct mlx4_ib_create_cq ucmd; 188 + int cqe_size = dev->dev->caps.cqe_size; 150 189 void *buf_addr; 190 + int shift; 191 + int n; 151 192 int err; 152 193 struct mlx4_ib_ucontext *context = rdma_udata_to_drv_context( 153 194 udata, struct mlx4_ib_ucontext, ibucontext); 154 195 155 - if (entries < 1 || entries > dev->dev->caps.max_cqes) 196 + if (attr->cqe > dev->dev->caps.max_cqes) 156 197 return -EINVAL; 157 198 158 199 if (attr->flags & ~CQ_CREATE_FLAGS_SUPPORTED) ··· 165 200 cq->ibcq.cqe = entries - 1; 166 201 mutex_init(&cq->resize_mutex); 167 202 spin_lock_init(&cq->lock); 168 - cq->resize_buf = NULL; 169 - cq->resize_umem = NULL; 170 - cq->create_flags = attr->flags; 171 203 INIT_LIST_HEAD(&cq->send_qp_list); 172 204 INIT_LIST_HEAD(&cq->recv_qp_list); 173 205 174 - if (udata) { 175 - struct mlx4_ib_create_cq ucmd; 206 + err = ib_copy_validate_udata_in(udata, ucmd, db_addr); 207 + if (err) 208 + goto err_cq; 176 209 177 - if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) { 178 - err = -EFAULT; 179 - goto err_cq; 180 - } 210 + if (ibcq->umem && 211 + (dev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_SW_CQ_INIT)) 212 + return -EOPNOTSUPP; 181 213 182 - buf_addr = (void *)(unsigned long)ucmd.buf_addr; 183 - err = mlx4_ib_get_cq_umem(dev, &cq->buf, &cq->umem, 184 - ucmd.buf_addr, entries); 185 - if (err) 186 - goto err_cq; 214 + buf_addr = (void *)(unsigned long)ucmd.buf_addr; 187 215 188 - err = mlx4_ib_db_map_user(udata, ucmd.db_addr, &cq->db); 189 - if (err) 190 - goto err_mtt; 191 - 192 - uar = &context->uar; 193 - cq->mcq.usage = MLX4_RES_USAGE_USER_VERBS; 194 - } else { 195 - err = mlx4_db_alloc(dev->dev, &cq->db, 1); 196 - if (err) 197 - goto err_cq; 198 - 199 - cq->mcq.set_ci_db = cq->db.db; 200 - cq->mcq.arm_db = cq->db.db + 1; 201 - *cq->mcq.set_ci_db = 0; 202 - *cq->mcq.arm_db = 0; 203 - 204 - err = mlx4_ib_alloc_cq_buf(dev, &cq->buf, entries); 205 - if (err) 206 - goto err_db; 207 - 208 - buf_addr = &cq->buf.buf; 209 - 210 - uar = &dev->priv_uar; 211 - cq->mcq.usage = MLX4_RES_USAGE_DRIVER; 216 + if (!ibcq->umem) 217 + ibcq->umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr, 218 + entries * cqe_size, 219 + IB_ACCESS_LOCAL_WRITE); 220 + if (IS_ERR(ibcq->umem)) { 221 + err = PTR_ERR(ibcq->umem); 222 + goto err_cq; 212 223 } 224 + 225 + shift = mlx4_ib_umem_calc_optimal_mtt_size(cq->ibcq.umem, 0, &n); 226 + if (shift < 0) { 227 + err = shift; 228 + goto err_cq; 229 + } 230 + 231 + err = mlx4_mtt_init(dev->dev, n, shift, &cq->buf.mtt); 232 + if (err) 233 + goto err_cq; 234 + 235 + err = mlx4_ib_umem_write_mtt(dev, &cq->buf.mtt, cq->ibcq.umem); 236 + if (err) 237 + goto err_mtt; 238 + 239 + err = mlx4_ib_db_map_user(udata, ucmd.db_addr, &cq->db); 240 + if (err) 241 + goto err_mtt; 213 242 214 243 if (dev->eq_table) 215 244 vector = dev->eq_table[vector % ibdev->num_comp_vectors]; 216 245 217 - err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar, cq->db.dma, 218 - &cq->mcq, vector, 0, 219 - !!(cq->create_flags & 220 - IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION), 221 - buf_addr, !!udata); 246 + err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, &context->uar, 247 + cq->db.dma, &cq->mcq, vector, 0, 248 + attr->flags & IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION, 249 + buf_addr, true); 222 250 if (err) 223 251 goto err_dbmap; 224 252 225 - if (udata) 226 - cq->mcq.tasklet_ctx.comp = mlx4_ib_cq_comp; 227 - else 228 - cq->mcq.comp = mlx4_ib_cq_comp; 253 + cq->mcq.tasklet_ctx.comp = mlx4_ib_cq_comp; 229 254 cq->mcq.event = mlx4_ib_cq_event; 255 + cq->mcq.usage = MLX4_RES_USAGE_USER_VERBS; 230 256 231 - if (udata) 232 - if (ib_copy_to_udata(udata, &cq->mcq.cqn, sizeof (__u32))) { 233 - err = -EFAULT; 234 - goto err_cq_free; 235 - } 257 + if (ib_copy_to_udata(udata, &cq->mcq.cqn, sizeof(__u32))) { 258 + err = -EFAULT; 259 + goto err_cq_free; 260 + } 236 261 237 262 return 0; 238 263 ··· 230 275 mlx4_cq_free(dev->dev, &cq->mcq); 231 276 232 277 err_dbmap: 233 - if (udata) 234 - mlx4_ib_db_unmap_user(context, &cq->db); 278 + mlx4_ib_db_unmap_user(context, &cq->db); 235 279 236 280 err_mtt: 237 281 mlx4_mtt_cleanup(dev->dev, &cq->buf.mtt); 282 + /* UMEM is released by ib_core */ 238 283 239 - ib_umem_release(cq->umem); 240 - if (!udata) 241 - mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe); 284 + err_cq: 285 + return err; 286 + } 287 + 288 + int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 289 + struct uverbs_attr_bundle *attrs) 290 + { 291 + struct ib_device *ibdev = ibcq->device; 292 + int entries = attr->cqe; 293 + int vector = attr->comp_vector; 294 + struct mlx4_ib_dev *dev = to_mdev(ibdev); 295 + struct mlx4_ib_cq *cq = to_mcq(ibcq); 296 + void *buf_addr; 297 + int err; 298 + 299 + if (attr->cqe > dev->dev->caps.max_cqes) 300 + return -EINVAL; 301 + 302 + entries = roundup_pow_of_two(entries + 1); 303 + cq->ibcq.cqe = entries - 1; 304 + mutex_init(&cq->resize_mutex); 305 + spin_lock_init(&cq->lock); 306 + INIT_LIST_HEAD(&cq->send_qp_list); 307 + INIT_LIST_HEAD(&cq->recv_qp_list); 308 + 309 + err = mlx4_db_alloc(dev->dev, &cq->db, 1); 310 + if (err) 311 + return err; 312 + 313 + cq->mcq.set_ci_db = cq->db.db; 314 + cq->mcq.arm_db = cq->db.db + 1; 315 + *cq->mcq.set_ci_db = 0; 316 + *cq->mcq.arm_db = 0; 317 + 318 + err = mlx4_ib_alloc_cq_buf(dev, &cq->buf, entries); 319 + if (err) 320 + goto err_db; 321 + 322 + buf_addr = &cq->buf.buf; 323 + 324 + if (dev->eq_table) 325 + vector = dev->eq_table[vector % ibdev->num_comp_vectors]; 326 + 327 + err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, &dev->priv_uar, 328 + cq->db.dma, &cq->mcq, vector, 0, 0, 329 + buf_addr, false); 330 + if (err) 331 + goto err_buf; 332 + 333 + cq->mcq.comp = mlx4_ib_cq_comp; 334 + cq->mcq.event = mlx4_ib_cq_event; 335 + cq->mcq.usage = MLX4_RES_USAGE_DRIVER; 336 + 337 + return 0; 338 + 339 + err_buf: 340 + mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe); 242 341 243 342 err_db: 244 - if (!udata) 245 - mlx4_db_free(dev->dev, &cq->db); 246 - err_cq: 343 + mlx4_db_free(dev->dev, &cq->db); 247 344 return err; 248 345 } 249 346 ··· 327 320 int entries, struct ib_udata *udata) 328 321 { 329 322 struct mlx4_ib_resize_cq ucmd; 323 + int cqe_size = dev->dev->caps.cqe_size; 324 + int shift; 325 + int n; 330 326 int err; 331 327 332 328 if (cq->resize_umem) 333 329 return -EBUSY; 334 330 335 - if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) 336 - return -EFAULT; 331 + err = ib_copy_validate_udata_in(udata, ucmd, buf_addr); 332 + if (err) 333 + return err; 337 334 338 335 cq->resize_buf = kmalloc_obj(*cq->resize_buf); 339 336 if (!cq->resize_buf) 340 337 return -ENOMEM; 341 338 342 - err = mlx4_ib_get_cq_umem(dev, &cq->resize_buf->buf, &cq->resize_umem, 343 - ucmd.buf_addr, entries); 344 - if (err) { 345 - kfree(cq->resize_buf); 346 - cq->resize_buf = NULL; 347 - return err; 339 + cq->resize_umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr, 340 + entries * cqe_size, 341 + IB_ACCESS_LOCAL_WRITE); 342 + if (IS_ERR(cq->resize_umem)) { 343 + err = PTR_ERR(cq->resize_umem); 344 + goto err_buf; 348 345 } 346 + 347 + shift = mlx4_ib_umem_calc_optimal_mtt_size(cq->resize_umem, 0, &n); 348 + if (shift < 0) { 349 + err = shift; 350 + goto err_umem; 351 + } 352 + 353 + err = mlx4_mtt_init(dev->dev, n, shift, &cq->resize_buf->buf.mtt); 354 + if (err) 355 + goto err_umem; 356 + 357 + err = mlx4_ib_umem_write_mtt(dev, &cq->resize_buf->buf.mtt, 358 + cq->resize_umem); 359 + if (err) 360 + goto err_mtt; 349 361 350 362 cq->resize_buf->cqe = entries - 1; 351 363 352 364 return 0; 365 + 366 + err_mtt: 367 + mlx4_mtt_cleanup(dev->dev, &cq->resize_buf->buf.mtt); 368 + 369 + err_umem: 370 + ib_umem_release(cq->resize_umem); 371 + 372 + err_buf: 373 + kfree(cq->resize_buf); 374 + cq->resize_buf = NULL; 375 + return err; 353 376 } 354 377 355 378 static int mlx4_ib_get_outstanding_cqes(struct mlx4_ib_cq *cq) ··· 418 381 ++cq->mcq.cons_index; 419 382 } 420 383 421 - int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) 384 + int mlx4_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries, 385 + struct ib_udata *udata) 422 386 { 423 387 struct mlx4_ib_dev *dev = to_mdev(ibcq->device); 424 388 struct mlx4_ib_cq *cq = to_mcq(ibcq); ··· 428 390 int err; 429 391 430 392 mutex_lock(&cq->resize_mutex); 431 - if (entries < 1 || entries > dev->dev->caps.max_cqes) { 393 + if (entries > dev->dev->caps.max_cqes) { 432 394 err = -EINVAL; 433 395 goto out; 434 396 } ··· 471 433 if (ibcq->uobject) { 472 434 cq->buf = cq->resize_buf->buf; 473 435 cq->ibcq.cqe = cq->resize_buf->cqe; 474 - ib_umem_release(cq->umem); 475 - cq->umem = cq->resize_umem; 436 + ib_umem_release(cq->ibcq.umem); 437 + cq->ibcq.umem = cq->resize_umem; 476 438 477 439 kfree(cq->resize_buf); 478 440 cq->resize_buf = NULL; ··· 532 494 struct mlx4_ib_ucontext, 533 495 ibucontext), 534 496 &mcq->db); 497 + /* UMEM is released by ib_core */ 535 498 } else { 536 499 mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe); 537 500 mlx4_db_free(dev->dev, &mcq->db); 538 501 } 539 - ib_umem_release(mcq->umem); 540 502 return 0; 541 503 } 542 504

+5 -9

drivers/infiniband/hw/mlx4/main.c

··· 50 50 #include <rdma/ib_user_verbs.h> 51 51 #include <rdma/ib_addr.h> 52 52 #include <rdma/ib_cache.h> 53 + #include <rdma/uverbs_ioctl.h> 53 54 54 55 #include <net/bonding.h> 55 56 ··· 446 445 struct mlx4_clock_params clock_params; 447 446 448 447 if (uhw->inlen) { 449 - if (uhw->inlen < sizeof(cmd)) 450 - return -EINVAL; 451 - 452 - err = ib_copy_from_udata(&cmd, uhw, sizeof(cmd)); 448 + err = ib_copy_validate_udata_in_cm(uhw, cmd, reserved, 0); 453 449 if (err) 454 450 return err; 455 - 456 - if (cmd.comp_mask) 457 - return -EINVAL; 458 451 459 452 if (cmd.reserved) 460 453 return -EINVAL; ··· 2156 2161 if (!*pdescs) 2157 2162 return -ENOMEM; 2158 2163 2159 - *offset = kcalloc(num_counters, sizeof(**offset), GFP_KERNEL); 2164 + *offset = kzalloc_objs(**offset, num_counters); 2160 2165 if (!*offset) 2161 2166 goto err; 2162 2167 ··· 2520 2525 .attach_mcast = mlx4_ib_mcg_attach, 2521 2526 .create_ah = mlx4_ib_create_ah, 2522 2527 .create_cq = mlx4_ib_create_cq, 2528 + .create_user_cq = mlx4_ib_create_user_cq, 2523 2529 .create_qp = mlx4_ib_create_qp, 2524 2530 .create_srq = mlx4_ib_create_srq, 2525 2531 .dealloc_pd = mlx4_ib_dealloc_pd, ··· 2563 2567 .reg_user_mr = mlx4_ib_reg_user_mr, 2564 2568 .req_notify_cq = mlx4_ib_arm_cq, 2565 2569 .rereg_user_mr = mlx4_ib_rereg_user_mr, 2566 - .resize_cq = mlx4_ib_resize_cq, 2570 + .resize_user_cq = mlx4_ib_resize_cq, 2567 2571 .report_port_event = mlx4_ib_port_event, 2568 2572 2569 2573 INIT_RDMA_OBJ_SIZE(ib_ah, mlx4_ib_ah, ibah),

+5 -3

drivers/infiniband/hw/mlx4/mlx4_ib.h

··· 121 121 struct mlx4_db db; 122 122 spinlock_t lock; 123 123 struct mutex resize_mutex; 124 - struct ib_umem *umem; 125 124 struct ib_umem *resize_umem; 126 - int create_flags; 127 125 /* List of qps that it serves.*/ 128 126 struct list_head send_qp_list; 129 127 struct list_head recv_qp_list; ··· 767 769 int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, 768 770 unsigned int *sg_offset); 769 771 int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period); 770 - int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata); 772 + int mlx4_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries, 773 + struct ib_udata *udata); 771 774 int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 772 775 struct uverbs_attr_bundle *attrs); 776 + int mlx4_ib_create_user_cq(struct ib_cq *ibcq, 777 + const struct ib_cq_init_attr *attr, 778 + struct uverbs_attr_bundle *attrs); 773 779 int mlx4_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata); 774 780 int mlx4_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc); 775 781 int mlx4_ib_arm_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags);

+1

drivers/infiniband/hw/mlx4/mr.c

··· 33 33 34 34 #include <linux/slab.h> 35 35 #include <rdma/ib_user_verbs.h> 36 + #include <rdma/iter.h> 36 37 37 38 #include "mlx4_ib.h" 38 39

+17 -65

drivers/infiniband/hw/mlx4/qp.c

··· 709 709 struct ib_qp_init_attr *init_attr, 710 710 struct ib_udata *udata) 711 711 { 712 - struct mlx4_ib_create_qp_rss ucmd = {}; 713 - size_t required_cmd_sz; 712 + struct mlx4_ib_create_qp_rss ucmd; 714 713 int err; 715 714 716 715 if (!udata) { ··· 720 721 if (udata->outlen) 721 722 return -EOPNOTSUPP; 722 723 723 - required_cmd_sz = offsetof(typeof(ucmd), reserved1) + 724 - sizeof(ucmd.reserved1); 725 - if (udata->inlen < required_cmd_sz) { 726 - pr_debug("invalid inlen\n"); 727 - return -EINVAL; 728 - } 729 - 730 - if (ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen))) { 724 + err = ib_copy_validate_udata_in_cm(udata, ucmd, reserved1, 0); 725 + if (err) { 731 726 pr_debug("copy failed\n"); 732 - return -EFAULT; 727 + return err; 733 728 } 734 729 735 730 if (memchr_inv(ucmd.reserved, 0, sizeof(ucmd.reserved))) 736 731 return -EOPNOTSUPP; 737 732 738 - if (ucmd.comp_mask || ucmd.reserved1) 733 + if (ucmd.reserved1) 739 734 return -EOPNOTSUPP; 740 - 741 - if (udata->inlen > sizeof(ucmd) && 742 - !ib_is_udata_cleared(udata, sizeof(ucmd), 743 - udata->inlen - sizeof(ucmd))) { 744 - pr_debug("inlen is not supported\n"); 745 - return -EOPNOTSUPP; 746 - } 747 735 748 736 if (init_attr->qp_type != IB_QPT_RAW_PACKET) { 749 737 pr_debug("RSS QP with unsupported QP type %d\n", ··· 854 868 unsigned long flags; 855 869 int range_size; 856 870 struct mlx4_ib_create_wq wq; 857 - size_t copy_len; 858 871 int shift; 859 872 int n; 860 873 ··· 866 881 867 882 qp->state = IB_QPS_RESET; 868 883 869 - copy_len = min(sizeof(struct mlx4_ib_create_wq), udata->inlen); 870 - 871 - if (ib_copy_from_udata(&wq, udata, copy_len)) { 872 - err = -EFAULT; 884 + err = ib_copy_validate_udata_in_cm(udata, wq, comp_mask, 0); 885 + if (err) 873 886 goto err; 874 - } 875 887 876 - if (wq.comp_mask || wq.reserved[0] || wq.reserved[1] || 877 - wq.reserved[2]) { 888 + if (wq.reserved[0] || wq.reserved[1] || wq.reserved[2]) { 878 889 pr_debug("user command isn't supported\n"); 879 890 err = -EOPNOTSUPP; 880 891 goto err; ··· 1048 1067 1049 1068 if (udata) { 1050 1069 struct mlx4_ib_create_qp ucmd; 1051 - size_t copy_len; 1052 1070 int shift; 1053 1071 int n; 1054 1072 1055 - copy_len = sizeof(struct mlx4_ib_create_qp); 1056 - 1057 - if (ib_copy_from_udata(&ucmd, udata, copy_len)) { 1058 - err = -EFAULT; 1073 + err = ib_copy_validate_udata_in(udata, ucmd, sq_no_prefetch); 1074 + if (err) 1059 1075 goto err; 1060 - } 1061 1076 1062 1077 qp->inl_recv_sz = ucmd.inl_recv_sz; 1063 1078 ··· 4107 4130 struct mlx4_dev *dev = to_mdev(pd->device)->dev; 4108 4131 struct ib_qp_init_attr ib_qp_init_attr = {}; 4109 4132 struct mlx4_ib_qp *qp; 4110 - struct mlx4_ib_create_wq ucmd; 4111 - int err, required_cmd_sz; 4133 + int err; 4112 4134 4113 4135 if (!udata) 4114 4136 return ERR_PTR(-EINVAL); 4115 - 4116 - required_cmd_sz = offsetof(typeof(ucmd), comp_mask) + 4117 - sizeof(ucmd.comp_mask); 4118 - if (udata->inlen < required_cmd_sz) { 4119 - pr_debug("invalid inlen\n"); 4120 - return ERR_PTR(-EINVAL); 4121 - } 4122 - 4123 - if (udata->inlen > sizeof(ucmd) && 4124 - !ib_is_udata_cleared(udata, sizeof(ucmd), 4125 - udata->inlen - sizeof(ucmd))) { 4126 - pr_debug("inlen is not supported\n"); 4127 - return ERR_PTR(-EOPNOTSUPP); 4128 - } 4129 4137 4130 4138 if (udata->outlen) 4131 4139 return ERR_PTR(-EOPNOTSUPP); ··· 4230 4268 u32 wq_attr_mask, struct ib_udata *udata) 4231 4269 { 4232 4270 struct mlx4_ib_qp *qp = to_mqp((struct ib_qp *)ibwq); 4233 - struct mlx4_ib_modify_wq ucmd = {}; 4234 - size_t required_cmd_sz; 4271 + struct mlx4_ib_modify_wq ucmd; 4235 4272 enum ib_wq_state cur_state, new_state; 4236 - int err = 0; 4273 + int err; 4237 4274 4238 - required_cmd_sz = offsetof(typeof(ucmd), reserved) + 4239 - sizeof(ucmd.reserved); 4240 - if (udata->inlen < required_cmd_sz) 4241 - return -EINVAL; 4275 + err = ib_copy_validate_udata_in_cm(udata, ucmd, reserved, 0); 4276 + if (err) 4277 + return err; 4242 4278 4243 - if (udata->inlen > sizeof(ucmd) && 4244 - !ib_is_udata_cleared(udata, sizeof(ucmd), 4245 - udata->inlen - sizeof(ucmd))) 4246 - return -EOPNOTSUPP; 4247 - 4248 - if (ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen))) 4249 - return -EFAULT; 4250 - 4251 - if (ucmd.comp_mask || ucmd.reserved) 4279 + if (ucmd.reserved) 4252 4280 return -EOPNOTSUPP; 4253 4281 4254 4282 if (wq_attr_mask & IB_WQ_FLAGS)

+3 -2

drivers/infiniband/hw/mlx4/srq.c

··· 111 111 if (udata) { 112 112 struct mlx4_ib_create_srq ucmd; 113 113 114 - if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) 115 - return -EFAULT; 114 + err = ib_copy_validate_udata_in(udata, ucmd, db_addr); 115 + if (err) 116 + return err; 116 117 117 118 srq->umem = 118 119 ib_umem_get(ib_srq->device, ucmd.buf_addr, buf_size, 0);

+114 -69

drivers/infiniband/hw/mlx5/cq.c

··· 720 720 int *cqe_size, int *index, int *inlen, 721 721 struct uverbs_attr_bundle *attrs) 722 722 { 723 - struct mlx5_ib_create_cq ucmd = {}; 723 + struct mlx5_ib_create_cq ucmd; 724 724 unsigned long page_size; 725 725 unsigned int page_offset_quantized; 726 - size_t ucmdlen; 727 726 __be64 *pas; 728 727 int ncont; 729 728 void *cqc; ··· 730 731 struct mlx5_ib_ucontext *context = rdma_udata_to_drv_context( 731 732 udata, struct mlx5_ib_ucontext, ibucontext); 732 733 733 - ucmdlen = min(udata->inlen, sizeof(ucmd)); 734 - if (ucmdlen < offsetof(struct mlx5_ib_create_cq, flags)) 735 - return -EINVAL; 736 - 737 - if (ib_copy_from_udata(&ucmd, udata, ucmdlen)) 738 - return -EFAULT; 734 + err = ib_copy_validate_udata_in(udata, ucmd, cqe_comp_res_format); 735 + if (err) 736 + return err; 739 737 740 738 if ((ucmd.flags & ~(MLX5_IB_CREATE_CQ_FLAGS_CQE_128B_PAD | 741 739 MLX5_IB_CREATE_CQ_FLAGS_UAR_PAGE_INDEX | ··· 745 749 746 750 *cqe_size = ucmd.cqe_size; 747 751 748 - cq->buf.umem = 749 - ib_umem_get(&dev->ib_dev, ucmd.buf_addr, 750 - entries * ucmd.cqe_size, IB_ACCESS_LOCAL_WRITE); 751 - if (IS_ERR(cq->buf.umem)) { 752 - err = PTR_ERR(cq->buf.umem); 753 - return err; 754 - } 752 + if (!cq->ibcq.umem) 753 + cq->ibcq.umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr, 754 + entries * ucmd.cqe_size, 755 + IB_ACCESS_LOCAL_WRITE); 756 + if (IS_ERR(cq->ibcq.umem)) 757 + return PTR_ERR(cq->ibcq.umem); 755 758 756 759 page_size = mlx5_umem_find_best_cq_quantized_pgoff( 757 - cq->buf.umem, cqc, log_page_size, MLX5_ADAPTER_PAGE_SHIFT, 760 + cq->ibcq.umem, cqc, log_page_size, MLX5_ADAPTER_PAGE_SHIFT, 758 761 page_offset, 64, &page_offset_quantized); 759 762 if (!page_size) { 760 763 err = -EINVAL; ··· 764 769 if (err) 765 770 goto err_umem; 766 771 767 - ncont = ib_umem_num_dma_blocks(cq->buf.umem, page_size); 772 + ncont = ib_umem_num_dma_blocks(cq->ibcq.umem, page_size); 768 773 mlx5_ib_dbg( 769 774 dev, 770 775 "addr 0x%llx, size %u, npages %zu, page_size %lu, ncont %d\n", 771 776 ucmd.buf_addr, entries * ucmd.cqe_size, 772 - ib_umem_num_pages(cq->buf.umem), page_size, ncont); 777 + ib_umem_num_pages(cq->ibcq.umem), page_size, ncont); 773 778 774 779 *inlen = MLX5_ST_SZ_BYTES(create_cq_in) + 775 780 MLX5_FLD_SZ_BYTES(create_cq_in, pas[0]) * ncont; ··· 780 785 } 781 786 782 787 pas = (__be64 *)MLX5_ADDR_OF(create_cq_in, *cqb, pas); 783 - mlx5_ib_populate_pas(cq->buf.umem, page_size, pas, 0); 788 + mlx5_ib_populate_pas(cq->ibcq.umem, page_size, pas, 0); 784 789 785 790 cqc = MLX5_ADDR_OF(create_cq_in, *cqb, cq_context); 786 791 MLX5_SET(cqc, cqc, log_page_size, ··· 853 858 mlx5_ib_db_unmap_user(context, &cq->db); 854 859 855 860 err_umem: 856 - ib_umem_release(cq->buf.umem); 861 + /* UMEM is released by ib_core */ 857 862 return err; 858 863 } 859 864 ··· 863 868 udata, struct mlx5_ib_ucontext, ibucontext); 864 869 865 870 mlx5_ib_db_unmap_user(context, &cq->db); 866 - ib_umem_release(cq->buf.umem); 867 871 } 868 872 869 873 static void init_cq_frag_buf(struct mlx5_ib_cq_buf *buf) ··· 943 949 cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context); 944 950 } 945 951 946 - int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 947 - struct uverbs_attr_bundle *attrs) 952 + int mlx5_ib_create_user_cq(struct ib_cq *ibcq, 953 + const struct ib_cq_init_attr *attr, 954 + struct uverbs_attr_bundle *attrs) 948 955 { 949 956 struct ib_udata *udata = &attrs->driver_udata; 950 957 struct ib_device *ibdev = ibcq->device; ··· 962 967 int eqn; 963 968 int err; 964 969 965 - if (entries < 0 || 966 - (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz)))) 970 + if (attr->cqe > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz))) 967 971 return -EINVAL; 968 972 969 973 if (check_cq_create_flags(attr->flags)) ··· 975 981 cq->ibcq.cqe = entries - 1; 976 982 mutex_init(&cq->resize_mutex); 977 983 spin_lock_init(&cq->lock); 978 - cq->resize_buf = NULL; 979 - cq->resize_umem = NULL; 980 - cq->create_flags = attr->flags; 984 + if (attr->flags & IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION) 985 + cq->private_flags |= MLX5_IB_CQ_PR_TIMESTAMP_COMPLETION; 981 986 INIT_LIST_HEAD(&cq->list_send_qp); 982 987 INIT_LIST_HEAD(&cq->list_recv_qp); 983 988 984 - if (udata) { 985 - err = create_cq_user(dev, udata, cq, entries, &cqb, &cqe_size, 986 - &index, &inlen, attrs); 987 - if (err) 988 - return err; 989 - } else { 990 - cqe_size = cache_line_size() == 128 ? 128 : 64; 991 - err = create_cq_kernel(dev, cq, entries, cqe_size, &cqb, 992 - &index, &inlen); 993 - if (err) 994 - return err; 995 - 996 - INIT_WORK(&cq->notify_work, notify_soft_wc_handler); 997 - } 989 + err = create_cq_user(dev, udata, cq, entries, &cqb, &cqe_size, &index, 990 + &inlen, attrs); 991 + if (err) 992 + return err; 998 993 999 994 err = mlx5_comp_eqn_get(dev->mdev, vector, &eqn); 1000 995 if (err) ··· 1000 1017 MLX5_SET(cqc, cqc, uar_page, index); 1001 1018 MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn); 1002 1019 MLX5_SET64(cqc, cqc, dbr_addr, cq->db.dma); 1003 - if (cq->create_flags & IB_UVERBS_CQ_FLAGS_IGNORE_OVERRUN) 1020 + if (attr->flags & IB_UVERBS_CQ_FLAGS_IGNORE_OVERRUN) 1004 1021 MLX5_SET(cqc, cqc, oi, 1); 1005 1022 1006 - if (udata) { 1007 - cq->mcq.comp = mlx5_add_cq_to_tasklet; 1008 - cq->mcq.tasklet_ctx.comp = mlx5_ib_cq_comp; 1009 - } else { 1010 - cq->mcq.comp = mlx5_ib_cq_comp; 1011 - } 1023 + cq->mcq.comp = mlx5_add_cq_to_tasklet; 1024 + cq->mcq.tasklet_ctx.comp = mlx5_ib_cq_comp; 1012 1025 1013 1026 err = mlx5_core_create_cq(dev->mdev, &cq->mcq, cqb, inlen, out, sizeof(out)); 1014 1027 if (err) ··· 1015 1036 1016 1037 INIT_LIST_HEAD(&cq->wc_list); 1017 1038 1018 - if (udata) 1019 - if (ib_copy_to_udata(udata, &cq->mcq.cqn, sizeof(__u32))) { 1020 - err = -EFAULT; 1021 - goto err_cmd; 1022 - } 1023 - 1039 + if (ib_copy_to_udata(udata, &cq->mcq.cqn, sizeof(__u32))) { 1040 + err = -EFAULT; 1041 + goto err_cmd; 1042 + } 1024 1043 1025 1044 kvfree(cqb); 1026 1045 return 0; ··· 1028 1051 1029 1052 err_cqb: 1030 1053 kvfree(cqb); 1031 - if (udata) 1032 - destroy_cq_user(cq, udata); 1033 - else 1034 - destroy_cq_kernel(dev, cq); 1054 + destroy_cq_user(cq, udata); 1055 + return err; 1056 + } 1057 + 1058 + 1059 + int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 1060 + struct uverbs_attr_bundle *attrs) 1061 + { 1062 + struct ib_device *ibdev = ibcq->device; 1063 + int entries = attr->cqe; 1064 + int vector = attr->comp_vector; 1065 + struct mlx5_ib_dev *dev = to_mdev(ibdev); 1066 + struct mlx5_ib_cq *cq = to_mcq(ibcq); 1067 + u32 out[MLX5_ST_SZ_DW(create_cq_out)]; 1068 + int index; 1069 + int inlen; 1070 + u32 *cqb = NULL; 1071 + void *cqc; 1072 + int cqe_size; 1073 + int eqn; 1074 + int err; 1075 + 1076 + if (attr->cqe > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz))) 1077 + return -EINVAL; 1078 + 1079 + entries = roundup_pow_of_two(entries + 1); 1080 + if (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz))) 1081 + return -EINVAL; 1082 + 1083 + cq->ibcq.cqe = entries - 1; 1084 + mutex_init(&cq->resize_mutex); 1085 + spin_lock_init(&cq->lock); 1086 + INIT_LIST_HEAD(&cq->list_send_qp); 1087 + INIT_LIST_HEAD(&cq->list_recv_qp); 1088 + 1089 + cqe_size = cache_line_size() == 128 ? 128 : 64; 1090 + err = create_cq_kernel(dev, cq, entries, cqe_size, &cqb, &index, 1091 + &inlen); 1092 + if (err) 1093 + return err; 1094 + 1095 + INIT_WORK(&cq->notify_work, notify_soft_wc_handler); 1096 + 1097 + err = mlx5_comp_eqn_get(dev->mdev, vector, &eqn); 1098 + if (err) 1099 + goto err_cqb; 1100 + 1101 + cq->cqe_size = cqe_size; 1102 + 1103 + cqc = MLX5_ADDR_OF(create_cq_in, cqb, cq_context); 1104 + MLX5_SET(cqc, cqc, cqe_sz, 1105 + cqe_sz_to_mlx_sz(cqe_size, 1106 + cq->private_flags & 1107 + MLX5_IB_CQ_PR_FLAGS_CQE_128_PAD)); 1108 + MLX5_SET(cqc, cqc, log_cq_size, ilog2(entries)); 1109 + MLX5_SET(cqc, cqc, uar_page, index); 1110 + MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn); 1111 + MLX5_SET64(cqc, cqc, dbr_addr, cq->db.dma); 1112 + 1113 + cq->mcq.comp = mlx5_ib_cq_comp; 1114 + 1115 + err = mlx5_core_create_cq(dev->mdev, &cq->mcq, cqb, inlen, out, 1116 + sizeof(out)); 1117 + if (err) 1118 + goto err_cqb; 1119 + 1120 + mlx5_ib_dbg(dev, "cqn 0x%x\n", cq->mcq.cqn); 1121 + cq->mcq.event = mlx5_ib_cq_event; 1122 + 1123 + INIT_LIST_HEAD(&cq->wc_list); 1124 + kvfree(cqb); 1125 + return 0; 1126 + 1127 + err_cqb: 1128 + kvfree(cqb); 1129 + destroy_cq_kernel(dev, cq); 1035 1130 return err; 1036 1131 } 1037 1132 ··· 1229 1180 struct ib_umem *umem; 1230 1181 int err; 1231 1182 1232 - err = ib_copy_from_udata(&ucmd, udata, sizeof(ucmd)); 1183 + err = ib_copy_validate_udata_in(udata, ucmd, reserved1); 1233 1184 if (err) 1234 1185 return err; 1235 1186 ··· 1331 1282 return 0; 1332 1283 } 1333 1284 1334 - int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) 1285 + int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries, 1286 + struct ib_udata *udata) 1335 1287 { 1336 1288 struct mlx5_ib_dev *dev = to_mdev(ibcq->device); 1337 1289 struct mlx5_ib_cq *cq = to_mcq(ibcq); ··· 1352 1302 return -ENOSYS; 1353 1303 } 1354 1304 1355 - if (entries < 1 || 1356 - entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz))) { 1357 - mlx5_ib_warn(dev, "wrong entries number %d, max %d\n", 1358 - entries, 1359 - 1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz)); 1305 + if (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz))) 1360 1306 return -EINVAL; 1361 - } 1362 1307 1363 1308 entries = roundup_pow_of_two(entries + 1); 1364 1309 if (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz)) + 1) ··· 1434 1389 1435 1390 if (udata) { 1436 1391 cq->ibcq.cqe = entries - 1; 1437 - ib_umem_release(cq->buf.umem); 1438 - cq->buf.umem = cq->resize_umem; 1392 + ib_umem_release(cq->ibcq.umem); 1393 + cq->ibcq.umem = cq->resize_umem; 1439 1394 cq->resize_umem = NULL; 1440 1395 } else { 1441 1396 struct mlx5_ib_cq_buf tbuf;

+3 -3

drivers/infiniband/hw/mlx5/devx.c

··· 1557 1557 if (IS_ERR(cmd_out)) 1558 1558 return PTR_ERR(cmd_out); 1559 1559 1560 - obj = kzalloc(sizeof(struct devx_obj), GFP_KERNEL); 1560 + obj = kzalloc_obj(*obj); 1561 1561 if (!obj) 1562 1562 return -ENOMEM; 1563 1563 ··· 2158 2158 if (err) 2159 2159 goto err; 2160 2160 2161 - event_sub = kzalloc(sizeof(*event_sub), GFP_KERNEL); 2161 + event_sub = kzalloc_obj(*event_sub); 2162 2162 if (!event_sub) { 2163 2163 err = -ENOMEM; 2164 2164 goto err; ··· 2398 2398 if (err) 2399 2399 return err; 2400 2400 2401 - obj = kzalloc(sizeof(struct devx_umem), GFP_KERNEL); 2401 + obj = kzalloc_obj(*obj); 2402 2402 if (!obj) 2403 2403 return -ENOMEM; 2404 2404

+1 -1

drivers/infiniband/hw/mlx5/dm.c

··· 228 228 if (!err || err != -ENOENT) 229 229 goto err_unlock; 230 230 231 - op_entry = kzalloc(sizeof(*op_entry), GFP_KERNEL); 231 + op_entry = kzalloc_obj(*op_entry); 232 232 if (!op_entry) 233 233 goto err_unlock; 234 234

+3 -3

drivers/infiniband/hw/mlx5/fs.c

··· 2917 2917 struct mlx5_ib_flow_matcher *obj; 2918 2918 int err; 2919 2919 2920 - obj = kzalloc(sizeof(struct mlx5_ib_flow_matcher), GFP_KERNEL); 2920 + obj = kzalloc_obj(*obj); 2921 2921 if (!obj) 2922 2922 return -ENOMEM; 2923 2923 ··· 3017 3017 if (err) 3018 3018 return err; 3019 3019 3020 - obj = kzalloc(sizeof(*obj), GFP_KERNEL); 3020 + obj = kzalloc_obj(*obj); 3021 3021 if (!obj) 3022 3022 return -ENOMEM; 3023 3023 ··· 3259 3259 if (!mlx5_ib_flow_action_packet_reformat_valid(mdev, dv_prt, ft_type)) 3260 3260 return -EOPNOTSUPP; 3261 3261 3262 - maction = kzalloc(sizeof(*maction), GFP_KERNEL); 3262 + maction = kzalloc_obj(*maction); 3263 3263 if (!maction) 3264 3264 return -ENOMEM; 3265 3265

+127 -45

drivers/infiniband/hw/mlx5/main.c

··· 2179 2179 { 2180 2180 struct ib_device *ibdev = uctx->device; 2181 2181 struct mlx5_ib_dev *dev = to_mdev(ibdev); 2182 - struct mlx5_ib_alloc_ucontext_req_v2 req = {}; 2182 + struct mlx5_ib_alloc_ucontext_req_v2 req; 2183 2183 struct mlx5_ib_alloc_ucontext_resp resp = {}; 2184 2184 struct mlx5_ib_ucontext *context = to_mucontext(uctx); 2185 2185 struct mlx5_bfreg_info *bfregi; ··· 2245 2245 2246 2246 mutex_init(&bfregi->lock); 2247 2247 bfregi->lib_uar_4k = lib_uar_4k; 2248 - bfregi->count = kcalloc(bfregi->total_num_bfregs, sizeof(*bfregi->count), 2249 - GFP_KERNEL); 2248 + bfregi->count = kzalloc_objs(*bfregi->count, bfregi->total_num_bfregs); 2250 2249 if (!bfregi->count) { 2251 2250 err = -ENOMEM; 2252 2251 goto out_ucap; 2253 2252 } 2254 2253 2255 - bfregi->sys_pages = kcalloc(bfregi->num_sys_pages, 2256 - sizeof(*bfregi->sys_pages), 2257 - GFP_KERNEL); 2254 + bfregi->sys_pages = 2255 + kzalloc_objs(*bfregi->sys_pages, bfregi->num_sys_pages); 2258 2256 if (!bfregi->sys_pages) { 2259 2257 err = -ENOMEM; 2260 2258 goto out_count; ··· 2517 2519 return rdma_user_mmap_entry_get_pgoff(ucontext, entry_pgoff); 2518 2520 } 2519 2521 2522 + static void mlx5_ib_free_var_mmap_entry(struct mlx5_user_mmap_entry *mentry, 2523 + struct mlx5_var_region *var_region) 2524 + { 2525 + mutex_lock(&var_region->bitmap_lock); 2526 + clear_bit(mentry->page_idx, var_region->bitmap); 2527 + mutex_unlock(&var_region->bitmap_lock); 2528 + kfree(mentry); 2529 + } 2530 + 2520 2531 static void mlx5_ib_mmap_free(struct rdma_user_mmap_entry *entry) 2521 2532 { 2522 2533 struct mlx5_user_mmap_entry *mentry = to_mmmap(entry); 2523 2534 struct mlx5_ib_dev *dev = to_mdev(entry->ucontext->device); 2524 2535 struct mlx5_var_table *var_table = &dev->var_table; 2525 2536 struct mlx5_ib_ucontext *context = to_mucontext(entry->ucontext); 2537 + struct mlx5_var_region *var_region; 2526 2538 2527 2539 switch (mentry->mmap_flag) { 2528 2540 case MLX5_IB_MMAP_TYPE_MEMIC: ··· 2540 2532 mlx5_ib_dm_mmap_free(dev, mentry); 2541 2533 break; 2542 2534 case MLX5_IB_MMAP_TYPE_VAR: 2543 - mutex_lock(&var_table->bitmap_lock); 2544 - clear_bit(mentry->page_idx, var_table->bitmap); 2545 - mutex_unlock(&var_table->bitmap_lock); 2546 - kfree(mentry); 2535 + var_region = &var_table->var_region; 2536 + mlx5_ib_free_var_mmap_entry(mentry, var_region); 2537 + break; 2538 + case MLX5_IB_MMAP_TYPE_TLP_VAR: 2539 + var_region = &var_table->tlp_var_region; 2540 + mlx5_ib_free_var_mmap_entry(mentry, var_region); 2547 2541 break; 2548 2542 case MLX5_IB_MMAP_TYPE_UAR_WC: 2549 2543 case MLX5_IB_MMAP_TYPE_UAR_NC: ··· 2696 2686 mentry = to_mmmap(entry); 2697 2687 pfn = (mentry->address >> PAGE_SHIFT); 2698 2688 if (mentry->mmap_flag == MLX5_IB_MMAP_TYPE_VAR || 2689 + mentry->mmap_flag == MLX5_IB_MMAP_TYPE_TLP_VAR || 2699 2690 mentry->mmap_flag == MLX5_IB_MMAP_TYPE_UAR_NC) 2700 2691 prot = pgprot_noncached(vma->vm_page_prot); 2701 2692 else ··· 4152 4141 } 4153 4142 4154 4143 static struct mlx5_user_mmap_entry * 4155 - alloc_var_entry(struct mlx5_ib_ucontext *c) 4144 + alloc_var_entry(struct mlx5_ib_ucontext *c, u32 flags) 4156 4145 { 4157 4146 struct mlx5_user_mmap_entry *entry; 4147 + struct mlx5_var_region *var_region; 4158 4148 struct mlx5_var_table *var_table; 4159 4149 u32 page_idx; 4160 4150 int err; 4161 4151 4162 4152 var_table = &to_mdev(c->ibucontext.device)->var_table; 4153 + if (flags & MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP) 4154 + var_region = &var_table->tlp_var_region; 4155 + else 4156 + var_region = &var_table->var_region; 4157 + 4163 4158 entry = kzalloc_obj(*entry); 4164 4159 if (!entry) 4165 4160 return ERR_PTR(-ENOMEM); 4166 4161 4167 - mutex_lock(&var_table->bitmap_lock); 4168 - page_idx = find_first_zero_bit(var_table->bitmap, 4169 - var_table->num_var_hw_entries); 4170 - if (page_idx >= var_table->num_var_hw_entries) { 4162 + mutex_lock(&var_region->bitmap_lock); 4163 + page_idx = find_first_zero_bit(var_region->bitmap, 4164 + var_region->num_var_hw_entries); 4165 + if (page_idx >= var_region->num_var_hw_entries) { 4171 4166 err = -ENOSPC; 4172 - mutex_unlock(&var_table->bitmap_lock); 4167 + mutex_unlock(&var_region->bitmap_lock); 4173 4168 goto end; 4174 4169 } 4175 4170 4176 - set_bit(page_idx, var_table->bitmap); 4177 - mutex_unlock(&var_table->bitmap_lock); 4171 + set_bit(page_idx, var_region->bitmap); 4172 + mutex_unlock(&var_region->bitmap_lock); 4178 4173 4179 - entry->address = var_table->hw_start_addr + 4180 - (page_idx * var_table->stride_size); 4174 + entry->address = var_region->hw_start_addr + 4175 + (page_idx * var_region->stride_size); 4181 4176 entry->page_idx = page_idx; 4182 - entry->mmap_flag = MLX5_IB_MMAP_TYPE_VAR; 4177 + entry->mmap_flag = flags & MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP ? 4178 + MLX5_IB_MMAP_TYPE_TLP_VAR : 4179 + MLX5_IB_MMAP_TYPE_VAR; 4183 4180 4184 4181 err = mlx5_rdma_user_mmap_entry_insert(c, entry, 4185 - var_table->stride_size); 4182 + var_region->stride_size); 4186 4183 if (err) 4187 4184 goto err_insert; 4188 4185 4189 4186 return entry; 4190 4187 4191 4188 err_insert: 4192 - mutex_lock(&var_table->bitmap_lock); 4193 - clear_bit(page_idx, var_table->bitmap); 4194 - mutex_unlock(&var_table->bitmap_lock); 4189 + mutex_lock(&var_region->bitmap_lock); 4190 + clear_bit(page_idx, var_region->bitmap); 4191 + mutex_unlock(&var_region->bitmap_lock); 4195 4192 end: 4196 4193 kfree(entry); 4197 4194 return ERR_PTR(err); ··· 4210 4191 { 4211 4192 struct ib_uobject *uobj = uverbs_attr_get_uobject( 4212 4193 attrs, MLX5_IB_ATTR_VAR_OBJ_ALLOC_HANDLE); 4213 - struct mlx5_ib_ucontext *c; 4214 4194 struct mlx5_user_mmap_entry *entry; 4195 + struct mlx5_ib_ucontext *c; 4215 4196 u64 mmap_offset; 4197 + u32 flags = 0; 4216 4198 u32 length; 4217 4199 int err; 4218 4200 ··· 4221 4201 if (IS_ERR(c)) 4222 4202 return PTR_ERR(c); 4223 4203 4224 - entry = alloc_var_entry(c); 4204 + err = uverbs_get_flags32(&flags, attrs, 4205 + MLX5_IB_ATTR_VAR_OBJ_ALLOC_FLAGS, 4206 + MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP); 4207 + if (err) 4208 + return err; 4209 + 4210 + if (flags & MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP) { 4211 + if (!MLX5_CAP_GEN(to_mdev(c->ibucontext.device)->mdev, 4212 + tlp_device_emulation_manager)) 4213 + return -EOPNOTSUPP; 4214 + } else { 4215 + if (!(MLX5_CAP_GEN_64(to_mdev(c->ibucontext.device)->mdev, 4216 + general_obj_types) & 4217 + MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q)) 4218 + return -EOPNOTSUPP; 4219 + } 4220 + 4221 + entry = alloc_var_entry(c, flags); 4225 4222 if (IS_ERR(entry)) 4226 4223 return PTR_ERR(entry); 4227 4224 ··· 4268 4231 MLX5_IB_OBJECT_VAR, 4269 4232 UVERBS_ACCESS_NEW, 4270 4233 UA_MANDATORY), 4234 + UVERBS_ATTR_FLAGS_IN(MLX5_IB_ATTR_VAR_OBJ_ALLOC_FLAGS, 4235 + enum mlx5_ib_uapi_var_alloc_flags, 4236 + UA_OPTIONAL), 4271 4237 UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_VAR_OBJ_ALLOC_PAGE_ID, 4272 4238 UVERBS_ATTR_TYPE(u32), 4273 4239 UA_MANDATORY), ··· 4298 4258 struct mlx5_ib_dev *dev = to_mdev(device); 4299 4259 4300 4260 return (MLX5_CAP_GEN_64(dev->mdev, general_obj_types) & 4301 - MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q); 4261 + MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) || 4262 + MLX5_CAP_GEN(dev->mdev, tlp_device_emulation_manager); 4302 4263 } 4303 4264 4304 4265 static struct mlx5_user_mmap_entry * ··· 4460 4419 4461 4420 static void mlx5_ib_stage_init_cleanup(struct mlx5_ib_dev *dev) 4462 4421 { 4422 + mlx5_cmd_cleanup_async_ctx(&dev->async_ctx); 4463 4423 mlx5_ib_data_direct_cleanup(dev); 4464 4424 mlx5_ib_cleanup_multiport_master(dev); 4465 4425 WARN_ON(!xa_empty(&dev->odp_mkeys)); ··· 4530 4488 if (err && err != -EOPNOTSUPP) 4531 4489 goto err_dd; 4532 4490 4491 + mlx5_cmd_init_async_ctx(mdev, &dev->async_ctx); 4492 + 4533 4493 return 0; 4534 4494 err_dd: 4535 4495 mlx5_ib_data_direct_cleanup(dev); ··· 4562 4518 .check_mr_status = mlx5_ib_check_mr_status, 4563 4519 .create_ah = mlx5_ib_create_ah, 4564 4520 .create_cq = mlx5_ib_create_cq, 4521 + .create_user_cq = mlx5_ib_create_user_cq, 4565 4522 .create_qp = mlx5_ib_create_qp, 4566 4523 .create_srq = mlx5_ib_create_srq, 4567 4524 .create_user_ah = mlx5_ib_create_ah, ··· 4613 4568 .reg_user_mr_dmabuf = mlx5_ib_reg_user_mr_dmabuf, 4614 4569 .req_notify_cq = mlx5_ib_arm_cq, 4615 4570 .rereg_user_mr = mlx5_ib_rereg_user_mr, 4616 - .resize_cq = mlx5_ib_resize_cq, 4571 + .resize_user_cq = mlx5_ib_resize_cq, 4617 4572 .ufile_hw_cleanup = mlx5_ib_ufile_hw_cleanup, 4618 4573 4619 4574 INIT_RDMA_OBJ_SIZE(ib_ah, mlx5_ib_ah, ibah), ··· 4652 4607 INIT_RDMA_OBJ_SIZE(ib_xrcd, mlx5_ib_xrcd, ibxrcd), 4653 4608 }; 4654 4609 4655 - static int mlx5_ib_init_var_table(struct mlx5_ib_dev *dev) 4610 + static int mlx5_ib_init_var_region(struct mlx5_ib_dev *dev) 4656 4611 { 4612 + struct mlx5_var_region *var_region = &dev->var_table.var_region; 4657 4613 struct mlx5_core_dev *mdev = dev->mdev; 4658 - struct mlx5_var_table *var_table = &dev->var_table; 4659 4614 u8 log_doorbell_bar_size; 4660 4615 u8 log_doorbell_stride; 4661 4616 u64 bar_size; ··· 4664 4619 log_doorbell_bar_size); 4665 4620 log_doorbell_stride = MLX5_CAP_DEV_VDPA_EMULATION(mdev, 4666 4621 log_doorbell_stride); 4667 - var_table->hw_start_addr = dev->mdev->bar_addr + 4622 + var_region->hw_start_addr = dev->mdev->bar_addr + 4668 4623 MLX5_CAP64_DEV_VDPA_EMULATION(mdev, 4669 4624 doorbell_bar_offset); 4670 4625 bar_size = (1ULL << log_doorbell_bar_size) * 4096; 4671 - var_table->stride_size = 1ULL << log_doorbell_stride; 4672 - var_table->num_var_hw_entries = div_u64(bar_size, 4673 - var_table->stride_size); 4674 - mutex_init(&var_table->bitmap_lock); 4675 - var_table->bitmap = bitmap_zalloc(var_table->num_var_hw_entries, 4676 - GFP_KERNEL); 4677 - return (var_table->bitmap) ? 0 : -ENOMEM; 4626 + var_region->stride_size = 1ULL << log_doorbell_stride; 4627 + var_region->num_var_hw_entries = div_u64(bar_size, 4628 + var_region->stride_size); 4629 + mutex_init(&var_region->bitmap_lock); 4630 + var_region->bitmap = bitmap_zalloc(var_region->num_var_hw_entries, 4631 + GFP_KERNEL); 4632 + return (var_region->bitmap) ? 0 : -ENOMEM; 4633 + } 4634 + 4635 + static int mlx5_ib_init_tlp_var_region(struct mlx5_ib_dev *dev) 4636 + { 4637 + struct mlx5_var_region *var_region = &dev->var_table.tlp_var_region; 4638 + struct mlx5_core_dev *mdev = dev->mdev; 4639 + u8 log_tlp_var_stride; 4640 + 4641 + log_tlp_var_stride = 4642 + MLX5_CAP_DEV_TLP_EMULATION(mdev, log_tlp_rsp_gw_page_stride); 4643 + var_region->hw_start_addr = 4644 + dev->mdev->bar_addr + 4645 + MLX5_CAP64_DEV_TLP_EMULATION(mdev, tlp_rsp_gw_pages_bar_offset); 4646 + 4647 + var_region->stride_size = (1ULL << log_tlp_var_stride) * 4096; 4648 + var_region->num_var_hw_entries = 4649 + MLX5_CAP_DEV_TLP_EMULATION(mdev, tlp_rsp_gw_num_pages); 4650 + 4651 + mutex_init(&var_region->bitmap_lock); 4652 + var_region->bitmap = bitmap_zalloc(var_region->num_var_hw_entries, 4653 + GFP_KERNEL); 4654 + return (var_region->bitmap) ? 0 : -ENOMEM; 4678 4655 } 4679 4656 4680 4657 static void mlx5_ib_cleanup_ucaps(struct mlx5_ib_dev *dev) ··· 4734 4667 return ret; 4735 4668 } 4736 4669 4670 + static void mlx5_ib_cleanup_var_table(struct mlx5_ib_dev *dev) 4671 + { 4672 + bitmap_free(dev->var_table.var_region.bitmap); 4673 + bitmap_free(dev->var_table.tlp_var_region.bitmap); 4674 + } 4675 + 4737 4676 static void mlx5_ib_stage_caps_cleanup(struct mlx5_ib_dev *dev) 4738 4677 { 4739 4678 if (MLX5_CAP_GEN_2_64(dev->mdev, general_obj_types_127_64) & 4740 4679 MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_RDMA_CTRL) 4741 4680 mlx5_ib_cleanup_ucaps(dev); 4742 4681 4743 - bitmap_free(dev->var_table.bitmap); 4682 + mlx5_ib_cleanup_var_table(dev); 4744 4683 } 4745 4684 4746 4685 static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev) ··· 4794 4721 4795 4722 if (MLX5_CAP_GEN_64(dev->mdev, general_obj_types) & 4796 4723 MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) { 4797 - err = mlx5_ib_init_var_table(dev); 4724 + err = mlx5_ib_init_var_region(dev); 4798 4725 if (err) 4799 4726 return err; 4800 4727 } ··· 4806 4733 goto err_ucaps; 4807 4734 } 4808 4735 4736 + if (MLX5_CAP_GEN(dev->mdev, tlp_device_emulation_manager)) { 4737 + err = mlx5_ib_init_tlp_var_region(dev); 4738 + if (err) 4739 + goto err_tlp_var; 4740 + } 4741 + 4809 4742 dev->ib_dev.use_cq_dim = true; 4810 4743 4811 4744 return 0; 4812 4745 4746 + err_tlp_var: 4747 + mlx5_ib_cleanup_ucaps(dev); 4813 4748 err_ucaps: 4814 - bitmap_free(dev->var_table.bitmap); 4749 + bitmap_free(dev->var_table.var_region.bitmap); 4815 4750 return err; 4816 4751 } 4817 4752 ··· 4955 4874 4956 4875 static void mlx5_ib_stage_pre_ib_reg_umr_cleanup(struct mlx5_ib_dev *dev) 4957 4876 { 4958 - mlx5_mkey_cache_cleanup(dev); 4877 + mlx5r_frmr_pools_cleanup(&dev->ib_dev); 4959 4878 mlx5r_umr_resource_cleanup(dev); 4960 4879 mlx5r_umr_cleanup(dev); 4961 4880 } ··· 4973 4892 if (ret) 4974 4893 return ret; 4975 4894 4976 - ret = mlx5_mkey_cache_init(dev); 4895 + ret = mlx5r_frmr_pools_init(&dev->ib_dev); 4977 4896 if (ret) 4978 - mlx5_ib_warn(dev, "mr cache init failed %d\n", ret); 4897 + mlx5_ib_warn(dev, "frmr pools init failed %d\n", ret); 4898 + 4979 4899 return ret; 4980 4900 } 4981 4901

+1

drivers/infiniband/hw/mlx5/mem.c

··· 31 31 */ 32 32 33 33 #include <rdma/ib_umem_odp.h> 34 + #include <rdma/iter.h> 34 35 #include "mlx5_ib.h" 35 36 36 37 /*

+15 -87

drivers/infiniband/hw/mlx5/mlx5_ib.h

··· 162 162 MLX5_IB_MMAP_TYPE_UAR_WC = 3, 163 163 MLX5_IB_MMAP_TYPE_UAR_NC = 4, 164 164 MLX5_IB_MMAP_TYPE_MEMIC_OP = 5, 165 + MLX5_IB_MMAP_TYPE_TLP_VAR = 6, 165 166 }; 166 167 167 168 struct mlx5_bfreg_info { ··· 561 560 enum mlx5_ib_cq_pr_flags { 562 561 MLX5_IB_CQ_PR_FLAGS_CQE_128_PAD = 1 << 0, 563 562 MLX5_IB_CQ_PR_FLAGS_REAL_TIME_TS = 1 << 1, 563 + MLX5_IB_CQ_PR_TIMESTAMP_COMPLETION = 1 << 2, 564 564 }; 565 565 566 566 struct mlx5_ib_cq { ··· 582 580 int cqe_size; 583 581 struct list_head list_send_qp; 584 582 struct list_head list_recv_qp; 585 - u32 create_flags; 586 583 struct list_head wc_list; 587 584 enum ib_cq_notify_flags notify_flags; 588 585 struct work_struct notify_work; ··· 641 640 /* Used for non-existent ph value */ 642 641 #define MLX5_IB_NO_PH 0xff 643 642 644 - struct mlx5r_cache_rb_key { 645 - u8 ats:1; 646 - u8 ph; 647 - u16 st_index; 648 - unsigned int access_mode; 649 - unsigned int access_flags; 650 - unsigned int ndescs; 651 - }; 652 - 653 643 struct mlx5_ib_mkey { 654 644 u32 key; 655 645 enum mlx5_mkey_type type; 656 646 unsigned int ndescs; 657 647 struct wait_queue_head wait; 658 648 refcount_t usecount; 659 - /* Cacheable user Mkey must hold either a rb_key or a cache_ent. */ 660 - struct mlx5r_cache_rb_key rb_key; 661 - struct mlx5_cache_ent *cache_ent; 662 - u8 cacheable : 1; 663 649 }; 664 650 665 651 #define MLX5_IB_MTT_PRESENT (MLX5_IB_MTT_READ | MLX5_IB_MTT_WRITE) ··· 769 781 unsigned int state; 770 782 /* Protects from repeat UMR QP creation */ 771 783 struct mutex init_lock; 772 - }; 773 - 774 - #define NUM_MKEYS_PER_PAGE \ 775 - ((PAGE_SIZE - sizeof(struct list_head)) / sizeof(u32)) 776 - 777 - struct mlx5_mkeys_page { 778 - u32 mkeys[NUM_MKEYS_PER_PAGE]; 779 - struct list_head list; 780 - }; 781 - static_assert(sizeof(struct mlx5_mkeys_page) == PAGE_SIZE); 782 - 783 - struct mlx5_mkeys_queue { 784 - struct list_head pages_list; 785 - u32 num_pages; 786 - unsigned long ci; 787 - spinlock_t lock; /* sync list ops */ 788 - }; 789 - 790 - struct mlx5_cache_ent { 791 - struct mlx5_mkeys_queue mkeys_queue; 792 - u32 pending; 793 - 794 - char name[4]; 795 - 796 - struct rb_node node; 797 - struct mlx5r_cache_rb_key rb_key; 798 - 799 - u8 is_tmp:1; 800 - u8 disabled:1; 801 - u8 fill_to_high_water:1; 802 - u8 tmp_cleanup_scheduled:1; 803 - 804 - /* 805 - * - limit is the low water mark for stored mkeys, 2* limit is the 806 - * upper water mark. 807 - */ 808 - u32 in_use; 809 - u32 limit; 810 - 811 - /* Statistics */ 812 - u32 miss; 813 - 814 - struct mlx5_ib_dev *dev; 815 - struct delayed_work dwork; 816 - }; 817 - 818 - struct mlx5r_async_create_mkey { 819 - union { 820 - u32 in[MLX5_ST_SZ_BYTES(create_mkey_in)]; 821 - u32 out[MLX5_ST_SZ_DW(create_mkey_out)]; 822 - }; 823 - struct mlx5_async_work cb_work; 824 - struct mlx5_cache_ent *ent; 825 - u32 mkey; 826 - }; 827 - 828 - struct mlx5_mkey_cache { 829 - struct workqueue_struct *wq; 830 - struct rb_root rb_root; 831 - struct mutex rb_lock; 832 - struct dentry *fs_root; 833 - unsigned long last_add; 834 784 }; 835 785 836 786 struct mlx5_ib_port_resources { ··· 1057 1131 struct xarray event_xa; 1058 1132 }; 1059 1133 1060 - struct mlx5_var_table { 1134 + struct mlx5_var_region { 1061 1135 /* serialize updating the bitmap */ 1062 1136 struct mutex bitmap_lock; 1063 1137 unsigned long *bitmap; 1064 1138 u64 hw_start_addr; 1065 1139 u32 stride_size; 1066 1140 u64 num_var_hw_entries; 1141 + }; 1142 + 1143 + struct mlx5_var_table { 1144 + struct mlx5_var_region var_region; 1145 + struct mlx5_var_region tlp_var_region; 1067 1146 }; 1068 1147 1069 1148 struct mlx5_port_caps { ··· 1112 1181 struct mlx5_ib_resources devr; 1113 1182 1114 1183 atomic_t mkey_var; 1115 - struct mlx5_mkey_cache cache; 1116 - struct timer_list delay_timer; 1117 1184 /* Prevents soft lock on massive reg MRs */ 1118 1185 struct mutex slow_path_mutex; 1119 1186 struct ib_odp_caps odp_caps; ··· 1299 1370 size_t buflen, size_t *bc); 1300 1371 int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 1301 1372 struct uverbs_attr_bundle *attrs); 1373 + int mlx5_ib_create_user_cq(struct ib_cq *ibcq, 1374 + const struct ib_cq_init_attr *attr, 1375 + struct uverbs_attr_bundle *attrs); 1302 1376 int mlx5_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata); 1303 1377 int mlx5_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc); 1304 1378 int mlx5_ib_pre_destroy_cq(struct ib_cq *cq); 1305 1379 void mlx5_ib_post_destroy_cq(struct ib_cq *cq); 1306 1380 int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags); 1307 1381 int mlx5_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period); 1308 - int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata); 1382 + int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries, 1383 + struct ib_udata *udata); 1309 1384 struct ib_mr *mlx5_ib_get_dma_mr(struct ib_pd *pd, int acc); 1310 1385 struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, 1311 1386 u64 virt_addr, int access_flags, ··· 1374 1441 void mlx5_ib_populate_pas(struct ib_umem *umem, size_t page_size, __be64 *pas, 1375 1442 u64 access_flags); 1376 1443 int mlx5_ib_get_cqe_size(struct ib_cq *ibcq); 1377 - int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev); 1378 - void mlx5_mkey_cache_cleanup(struct mlx5_ib_dev *dev); 1379 - struct mlx5_cache_ent * 1380 - mlx5r_cache_create_ent_locked(struct mlx5_ib_dev *dev, 1381 - struct mlx5r_cache_rb_key rb_key, 1382 - bool persistent_entry); 1383 - 1444 + int mlx5r_frmr_pools_init(struct ib_device *device); 1445 + void mlx5r_frmr_pools_cleanup(struct ib_device *device); 1384 1446 struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev, 1385 1447 int access_flags, int access_mode, 1386 1448 int ndescs);

+196 -976

drivers/infiniband/hw/mlx5/mr.c

··· 31 31 * SOFTWARE. 32 32 */ 33 33 34 - 35 34 #include <linux/kref.h> 36 35 #include <linux/random.h> 37 36 #include <linux/debugfs.h> ··· 38 39 #include <linux/delay.h> 39 40 #include <linux/dma-buf.h> 40 41 #include <linux/dma-resv.h> 42 + #include <rdma/frmr_pools.h> 41 43 #include <rdma/ib_umem_odp.h> 42 44 #include "dm.h" 43 45 #include "mlx5_ib.h" ··· 46 46 #include "data_direct.h" 47 47 #include "dmah.h" 48 48 49 - enum { 50 - MAX_PENDING_REG_MR = 8, 51 - }; 49 + static int mkey_max_umr_order(struct mlx5_ib_dev *dev) 50 + { 51 + if (MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset)) 52 + return MLX5_MAX_UMR_EXTENDED_SHIFT; 53 + return MLX5_MAX_UMR_SHIFT; 54 + } 52 55 53 - #define MLX5_MR_CACHE_PERSISTENT_ENTRY_MIN_DESCS 4 54 - 55 - static void 56 - create_mkey_callback(int status, struct mlx5_async_work *context); 57 56 static struct mlx5_ib_mr *reg_create(struct ib_pd *pd, struct ib_umem *umem, 58 57 u64 iova, int access_flags, 59 58 unsigned long page_size, bool populate, ··· 109 110 return ret; 110 111 } 111 112 112 - static int mlx5_ib_create_mkey_cb(struct mlx5r_async_create_mkey *async_create) 113 - { 114 - struct mlx5_ib_dev *dev = async_create->ent->dev; 115 - size_t inlen = MLX5_ST_SZ_BYTES(create_mkey_in); 116 - size_t outlen = MLX5_ST_SZ_BYTES(create_mkey_out); 117 - 118 - MLX5_SET(create_mkey_in, async_create->in, opcode, 119 - MLX5_CMD_OP_CREATE_MKEY); 120 - assign_mkey_variant(dev, &async_create->mkey, async_create->in); 121 - return mlx5_cmd_exec_cb(&dev->async_ctx, async_create->in, inlen, 122 - async_create->out, outlen, create_mkey_callback, 123 - &async_create->cb_work); 124 - } 125 - 126 - static int mkey_cache_max_order(struct mlx5_ib_dev *dev); 127 - static void queue_adjust_cache_locked(struct mlx5_cache_ent *ent); 128 - 129 113 static int destroy_mkey(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr) 130 114 { 131 115 WARN_ON(xa_load(&dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key))); 132 116 133 117 return mlx5_core_destroy_mkey(dev->mdev, mr->mmkey.key); 134 - } 135 - 136 - static void create_mkey_warn(struct mlx5_ib_dev *dev, int status, void *out) 137 - { 138 - if (status == -ENXIO) /* core driver is not available */ 139 - return; 140 - 141 - mlx5_ib_warn(dev, "async reg mr failed. status %d\n", status); 142 - if (status != -EREMOTEIO) /* driver specific failure */ 143 - return; 144 - 145 - /* Failed in FW, print cmd out failure details */ 146 - mlx5_cmd_out_err(dev->mdev, MLX5_CMD_OP_CREATE_MKEY, 0, out); 147 - } 148 - 149 - static int push_mkey_locked(struct mlx5_cache_ent *ent, u32 mkey) 150 - { 151 - unsigned long tmp = ent->mkeys_queue.ci % NUM_MKEYS_PER_PAGE; 152 - struct mlx5_mkeys_page *page; 153 - 154 - lockdep_assert_held(&ent->mkeys_queue.lock); 155 - if (ent->mkeys_queue.ci >= 156 - ent->mkeys_queue.num_pages * NUM_MKEYS_PER_PAGE) { 157 - page = kzalloc_obj(*page, GFP_ATOMIC); 158 - if (!page) 159 - return -ENOMEM; 160 - ent->mkeys_queue.num_pages++; 161 - list_add_tail(&page->list, &ent->mkeys_queue.pages_list); 162 - } else { 163 - page = list_last_entry(&ent->mkeys_queue.pages_list, 164 - struct mlx5_mkeys_page, list); 165 - } 166 - 167 - page->mkeys[tmp] = mkey; 168 - ent->mkeys_queue.ci++; 169 - return 0; 170 - } 171 - 172 - static int pop_mkey_locked(struct mlx5_cache_ent *ent) 173 - { 174 - unsigned long tmp = (ent->mkeys_queue.ci - 1) % NUM_MKEYS_PER_PAGE; 175 - struct mlx5_mkeys_page *last_page; 176 - u32 mkey; 177 - 178 - lockdep_assert_held(&ent->mkeys_queue.lock); 179 - last_page = list_last_entry(&ent->mkeys_queue.pages_list, 180 - struct mlx5_mkeys_page, list); 181 - mkey = last_page->mkeys[tmp]; 182 - last_page->mkeys[tmp] = 0; 183 - ent->mkeys_queue.ci--; 184 - if (ent->mkeys_queue.num_pages > 1 && !tmp) { 185 - list_del(&last_page->list); 186 - ent->mkeys_queue.num_pages--; 187 - kfree(last_page); 188 - } 189 - return mkey; 190 - } 191 - 192 - static void create_mkey_callback(int status, struct mlx5_async_work *context) 193 - { 194 - struct mlx5r_async_create_mkey *mkey_out = 195 - container_of(context, struct mlx5r_async_create_mkey, cb_work); 196 - struct mlx5_cache_ent *ent = mkey_out->ent; 197 - struct mlx5_ib_dev *dev = ent->dev; 198 - unsigned long flags; 199 - 200 - if (status) { 201 - create_mkey_warn(dev, status, mkey_out->out); 202 - kfree(mkey_out); 203 - spin_lock_irqsave(&ent->mkeys_queue.lock, flags); 204 - ent->pending--; 205 - WRITE_ONCE(dev->fill_delay, 1); 206 - spin_unlock_irqrestore(&ent->mkeys_queue.lock, flags); 207 - mod_timer(&dev->delay_timer, jiffies + HZ); 208 - return; 209 - } 210 - 211 - mkey_out->mkey |= mlx5_idx_to_mkey( 212 - MLX5_GET(create_mkey_out, mkey_out->out, mkey_index)); 213 - WRITE_ONCE(dev->cache.last_add, jiffies); 214 - 215 - spin_lock_irqsave(&ent->mkeys_queue.lock, flags); 216 - push_mkey_locked(ent, mkey_out->mkey); 217 - ent->pending--; 218 - /* If we are doing fill_to_high_water then keep going. */ 219 - queue_adjust_cache_locked(ent); 220 - spin_unlock_irqrestore(&ent->mkeys_queue.lock, flags); 221 - kfree(mkey_out); 222 118 } 223 119 224 120 static int get_mkc_octo_size(unsigned int access_mode, unsigned int ndescs) ··· 133 239 WARN_ON(1); 134 240 } 135 241 return ret; 136 - } 137 - 138 - static void set_cache_mkc(struct mlx5_cache_ent *ent, void *mkc) 139 - { 140 - set_mkc_access_pd_addr_fields(mkc, ent->rb_key.access_flags, 0, 141 - ent->dev->umrc.pd); 142 - MLX5_SET(mkc, mkc, free, 1); 143 - MLX5_SET(mkc, mkc, umr_en, 1); 144 - MLX5_SET(mkc, mkc, access_mode_1_0, ent->rb_key.access_mode & 0x3); 145 - MLX5_SET(mkc, mkc, access_mode_4_2, 146 - (ent->rb_key.access_mode >> 2) & 0x7); 147 - MLX5_SET(mkc, mkc, ma_translation_mode, !!ent->rb_key.ats); 148 - 149 - MLX5_SET(mkc, mkc, translations_octword_size, 150 - get_mkc_octo_size(ent->rb_key.access_mode, 151 - ent->rb_key.ndescs)); 152 - MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT); 153 - 154 - if (ent->rb_key.ph != MLX5_IB_NO_PH) { 155 - MLX5_SET(mkc, mkc, pcie_tph_en, 1); 156 - MLX5_SET(mkc, mkc, pcie_tph_ph, ent->rb_key.ph); 157 - if (ent->rb_key.st_index != MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX) 158 - MLX5_SET(mkc, mkc, pcie_tph_steering_tag_index, 159 - ent->rb_key.st_index); 160 - } 161 - } 162 - 163 - /* Asynchronously schedule new MRs to be populated in the cache. */ 164 - static int add_keys(struct mlx5_cache_ent *ent, unsigned int num) 165 - { 166 - struct mlx5r_async_create_mkey *async_create; 167 - void *mkc; 168 - int err = 0; 169 - int i; 170 - 171 - for (i = 0; i < num; i++) { 172 - async_create = kzalloc_obj(struct mlx5r_async_create_mkey); 173 - if (!async_create) 174 - return -ENOMEM; 175 - mkc = MLX5_ADDR_OF(create_mkey_in, async_create->in, 176 - memory_key_mkey_entry); 177 - set_cache_mkc(ent, mkc); 178 - async_create->ent = ent; 179 - 180 - spin_lock_irq(&ent->mkeys_queue.lock); 181 - if (ent->pending >= MAX_PENDING_REG_MR) { 182 - err = -EAGAIN; 183 - goto free_async_create; 184 - } 185 - ent->pending++; 186 - spin_unlock_irq(&ent->mkeys_queue.lock); 187 - 188 - err = mlx5_ib_create_mkey_cb(async_create); 189 - if (err) { 190 - mlx5_ib_warn(ent->dev, "create mkey failed %d\n", err); 191 - goto err_create_mkey; 192 - } 193 - } 194 - 195 - return 0; 196 - 197 - err_create_mkey: 198 - spin_lock_irq(&ent->mkeys_queue.lock); 199 - ent->pending--; 200 - free_async_create: 201 - spin_unlock_irq(&ent->mkeys_queue.lock); 202 - kfree(async_create); 203 - return err; 204 - } 205 - 206 - /* Synchronously create a MR in the cache */ 207 - static int create_cache_mkey(struct mlx5_cache_ent *ent, u32 *mkey) 208 - { 209 - size_t inlen = MLX5_ST_SZ_BYTES(create_mkey_in); 210 - void *mkc; 211 - u32 *in; 212 - int err; 213 - 214 - in = kzalloc(inlen, GFP_KERNEL); 215 - if (!in) 216 - return -ENOMEM; 217 - mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry); 218 - set_cache_mkc(ent, mkc); 219 - 220 - err = mlx5_core_create_mkey(ent->dev->mdev, mkey, in, inlen); 221 - if (err) 222 - goto free_in; 223 - 224 - WRITE_ONCE(ent->dev->cache.last_add, jiffies); 225 - free_in: 226 - kfree(in); 227 - return err; 228 - } 229 - 230 - static void remove_cache_mr_locked(struct mlx5_cache_ent *ent) 231 - { 232 - u32 mkey; 233 - 234 - lockdep_assert_held(&ent->mkeys_queue.lock); 235 - if (!ent->mkeys_queue.ci) 236 - return; 237 - mkey = pop_mkey_locked(ent); 238 - spin_unlock_irq(&ent->mkeys_queue.lock); 239 - mlx5_core_destroy_mkey(ent->dev->mdev, mkey); 240 - spin_lock_irq(&ent->mkeys_queue.lock); 241 - } 242 - 243 - static int resize_available_mrs(struct mlx5_cache_ent *ent, unsigned int target, 244 - bool limit_fill) 245 - __acquires(&ent->mkeys_queue.lock) __releases(&ent->mkeys_queue.lock) 246 - { 247 - int err; 248 - 249 - lockdep_assert_held(&ent->mkeys_queue.lock); 250 - 251 - while (true) { 252 - if (limit_fill) 253 - target = ent->limit * 2; 254 - if (target == ent->pending + ent->mkeys_queue.ci) 255 - return 0; 256 - if (target > ent->pending + ent->mkeys_queue.ci) { 257 - u32 todo = target - (ent->pending + ent->mkeys_queue.ci); 258 - 259 - spin_unlock_irq(&ent->mkeys_queue.lock); 260 - err = add_keys(ent, todo); 261 - if (err == -EAGAIN) 262 - usleep_range(3000, 5000); 263 - spin_lock_irq(&ent->mkeys_queue.lock); 264 - if (err) { 265 - if (err != -EAGAIN) 266 - return err; 267 - } else 268 - return 0; 269 - } else { 270 - remove_cache_mr_locked(ent); 271 - } 272 - } 273 - } 274 - 275 - static ssize_t size_write(struct file *filp, const char __user *buf, 276 - size_t count, loff_t *pos) 277 - { 278 - struct mlx5_cache_ent *ent = filp->private_data; 279 - u32 target; 280 - int err; 281 - 282 - err = kstrtou32_from_user(buf, count, 0, &target); 283 - if (err) 284 - return err; 285 - 286 - /* 287 - * Target is the new value of total_mrs the user requests, however we 288 - * cannot free MRs that are in use. Compute the target value for stored 289 - * mkeys. 290 - */ 291 - spin_lock_irq(&ent->mkeys_queue.lock); 292 - if (target < ent->in_use) { 293 - err = -EINVAL; 294 - goto err_unlock; 295 - } 296 - target = target - ent->in_use; 297 - if (target < ent->limit || target > ent->limit*2) { 298 - err = -EINVAL; 299 - goto err_unlock; 300 - } 301 - err = resize_available_mrs(ent, target, false); 302 - if (err) 303 - goto err_unlock; 304 - spin_unlock_irq(&ent->mkeys_queue.lock); 305 - 306 - return count; 307 - 308 - err_unlock: 309 - spin_unlock_irq(&ent->mkeys_queue.lock); 310 - return err; 311 - } 312 - 313 - static ssize_t size_read(struct file *filp, char __user *buf, size_t count, 314 - loff_t *pos) 315 - { 316 - struct mlx5_cache_ent *ent = filp->private_data; 317 - char lbuf[20]; 318 - int err; 319 - 320 - err = snprintf(lbuf, sizeof(lbuf), "%ld\n", 321 - ent->mkeys_queue.ci + ent->in_use); 322 - if (err < 0) 323 - return err; 324 - 325 - return simple_read_from_buffer(buf, count, pos, lbuf, err); 326 - } 327 - 328 - static const struct file_operations size_fops = { 329 - .owner = THIS_MODULE, 330 - .open = simple_open, 331 - .write = size_write, 332 - .read = size_read, 333 - }; 334 - 335 - static ssize_t limit_write(struct file *filp, const char __user *buf, 336 - size_t count, loff_t *pos) 337 - { 338 - struct mlx5_cache_ent *ent = filp->private_data; 339 - u32 var; 340 - int err; 341 - 342 - err = kstrtou32_from_user(buf, count, 0, &var); 343 - if (err) 344 - return err; 345 - 346 - /* 347 - * Upon set we immediately fill the cache to high water mark implied by 348 - * the limit. 349 - */ 350 - spin_lock_irq(&ent->mkeys_queue.lock); 351 - ent->limit = var; 352 - err = resize_available_mrs(ent, 0, true); 353 - spin_unlock_irq(&ent->mkeys_queue.lock); 354 - if (err) 355 - return err; 356 - return count; 357 - } 358 - 359 - static ssize_t limit_read(struct file *filp, char __user *buf, size_t count, 360 - loff_t *pos) 361 - { 362 - struct mlx5_cache_ent *ent = filp->private_data; 363 - char lbuf[20]; 364 - int err; 365 - 366 - err = snprintf(lbuf, sizeof(lbuf), "%d\n", ent->limit); 367 - if (err < 0) 368 - return err; 369 - 370 - return simple_read_from_buffer(buf, count, pos, lbuf, err); 371 - } 372 - 373 - static const struct file_operations limit_fops = { 374 - .owner = THIS_MODULE, 375 - .open = simple_open, 376 - .write = limit_write, 377 - .read = limit_read, 378 - }; 379 - 380 - static bool someone_adding(struct mlx5_mkey_cache *cache) 381 - { 382 - struct mlx5_cache_ent *ent; 383 - struct rb_node *node; 384 - bool ret; 385 - 386 - mutex_lock(&cache->rb_lock); 387 - for (node = rb_first(&cache->rb_root); node; node = rb_next(node)) { 388 - ent = rb_entry(node, struct mlx5_cache_ent, node); 389 - spin_lock_irq(&ent->mkeys_queue.lock); 390 - ret = ent->mkeys_queue.ci < ent->limit; 391 - spin_unlock_irq(&ent->mkeys_queue.lock); 392 - if (ret) { 393 - mutex_unlock(&cache->rb_lock); 394 - return true; 395 - } 396 - } 397 - mutex_unlock(&cache->rb_lock); 398 - return false; 399 - } 400 - 401 - /* 402 - * Check if the bucket is outside the high/low water mark and schedule an async 403 - * update. The cache refill has hysteresis, once the low water mark is hit it is 404 - * refilled up to the high mark. 405 - */ 406 - static void queue_adjust_cache_locked(struct mlx5_cache_ent *ent) 407 - { 408 - lockdep_assert_held(&ent->mkeys_queue.lock); 409 - 410 - if (ent->disabled || READ_ONCE(ent->dev->fill_delay) || ent->is_tmp) 411 - return; 412 - if (ent->mkeys_queue.ci < ent->limit) { 413 - ent->fill_to_high_water = true; 414 - mod_delayed_work(ent->dev->cache.wq, &ent->dwork, 0); 415 - } else if (ent->fill_to_high_water && 416 - ent->mkeys_queue.ci + ent->pending < 2 * ent->limit) { 417 - /* 418 - * Once we start populating due to hitting a low water mark 419 - * continue until we pass the high water mark. 420 - */ 421 - mod_delayed_work(ent->dev->cache.wq, &ent->dwork, 0); 422 - } else if (ent->mkeys_queue.ci == 2 * ent->limit) { 423 - ent->fill_to_high_water = false; 424 - } else if (ent->mkeys_queue.ci > 2 * ent->limit) { 425 - /* Queue deletion of excess entries */ 426 - ent->fill_to_high_water = false; 427 - if (ent->pending) 428 - queue_delayed_work(ent->dev->cache.wq, &ent->dwork, 429 - secs_to_jiffies(1)); 430 - else 431 - mod_delayed_work(ent->dev->cache.wq, &ent->dwork, 0); 432 - } 433 - } 434 - 435 - static void clean_keys(struct mlx5_ib_dev *dev, struct mlx5_cache_ent *ent) 436 - { 437 - u32 mkey; 438 - 439 - spin_lock_irq(&ent->mkeys_queue.lock); 440 - while (ent->mkeys_queue.ci) { 441 - mkey = pop_mkey_locked(ent); 442 - spin_unlock_irq(&ent->mkeys_queue.lock); 443 - mlx5_core_destroy_mkey(dev->mdev, mkey); 444 - spin_lock_irq(&ent->mkeys_queue.lock); 445 - } 446 - ent->tmp_cleanup_scheduled = false; 447 - spin_unlock_irq(&ent->mkeys_queue.lock); 448 - } 449 - 450 - static void __cache_work_func(struct mlx5_cache_ent *ent) 451 - { 452 - struct mlx5_ib_dev *dev = ent->dev; 453 - struct mlx5_mkey_cache *cache = &dev->cache; 454 - int err; 455 - 456 - spin_lock_irq(&ent->mkeys_queue.lock); 457 - if (ent->disabled) 458 - goto out; 459 - 460 - if (ent->fill_to_high_water && 461 - ent->mkeys_queue.ci + ent->pending < 2 * ent->limit && 462 - !READ_ONCE(dev->fill_delay)) { 463 - spin_unlock_irq(&ent->mkeys_queue.lock); 464 - err = add_keys(ent, 1); 465 - spin_lock_irq(&ent->mkeys_queue.lock); 466 - if (ent->disabled) 467 - goto out; 468 - if (err) { 469 - /* 470 - * EAGAIN only happens if there are pending MRs, so we 471 - * will be rescheduled when storing them. The only 472 - * failure path here is ENOMEM. 473 - */ 474 - if (err != -EAGAIN) { 475 - mlx5_ib_warn( 476 - dev, 477 - "add keys command failed, err %d\n", 478 - err); 479 - queue_delayed_work(cache->wq, &ent->dwork, 480 - secs_to_jiffies(1)); 481 - } 482 - } 483 - } else if (ent->mkeys_queue.ci > 2 * ent->limit) { 484 - bool need_delay; 485 - 486 - /* 487 - * The remove_cache_mr() logic is performed as garbage 488 - * collection task. Such task is intended to be run when no 489 - * other active processes are running. 490 - * 491 - * The need_resched() will return TRUE if there are user tasks 492 - * to be activated in near future. 493 - * 494 - * In such case, we don't execute remove_cache_mr() and postpone 495 - * the garbage collection work to try to run in next cycle, in 496 - * order to free CPU resources to other tasks. 497 - */ 498 - spin_unlock_irq(&ent->mkeys_queue.lock); 499 - need_delay = need_resched() || someone_adding(cache) || 500 - !time_after(jiffies, 501 - READ_ONCE(cache->last_add) + 300 * HZ); 502 - spin_lock_irq(&ent->mkeys_queue.lock); 503 - if (ent->disabled) 504 - goto out; 505 - if (need_delay) { 506 - queue_delayed_work(cache->wq, &ent->dwork, 300 * HZ); 507 - goto out; 508 - } 509 - remove_cache_mr_locked(ent); 510 - queue_adjust_cache_locked(ent); 511 - } 512 - out: 513 - spin_unlock_irq(&ent->mkeys_queue.lock); 514 - } 515 - 516 - static void delayed_cache_work_func(struct work_struct *work) 517 - { 518 - struct mlx5_cache_ent *ent; 519 - 520 - ent = container_of(work, struct mlx5_cache_ent, dwork.work); 521 - /* temp entries are never filled, only cleaned */ 522 - if (ent->is_tmp) 523 - clean_keys(ent->dev, ent); 524 - else 525 - __cache_work_func(ent); 526 - } 527 - 528 - static int cache_ent_key_cmp(struct mlx5r_cache_rb_key key1, 529 - struct mlx5r_cache_rb_key key2) 530 - { 531 - int res; 532 - 533 - res = key1.ats - key2.ats; 534 - if (res) 535 - return res; 536 - 537 - res = key1.access_mode - key2.access_mode; 538 - if (res) 539 - return res; 540 - 541 - res = key1.access_flags - key2.access_flags; 542 - if (res) 543 - return res; 544 - 545 - res = key1.st_index - key2.st_index; 546 - if (res) 547 - return res; 548 - 549 - res = key1.ph - key2.ph; 550 - if (res) 551 - return res; 552 - 553 - /* 554 - * keep ndescs the last in the compare table since the find function 555 - * searches for an exact match on all properties and only closest 556 - * match in size. 557 - */ 558 - return key1.ndescs - key2.ndescs; 559 - } 560 - 561 - static int mlx5_cache_ent_insert(struct mlx5_mkey_cache *cache, 562 - struct mlx5_cache_ent *ent) 563 - { 564 - struct rb_node **new = &cache->rb_root.rb_node, *parent = NULL; 565 - struct mlx5_cache_ent *cur; 566 - int cmp; 567 - 568 - /* Figure out where to put new node */ 569 - while (*new) { 570 - cur = rb_entry(*new, struct mlx5_cache_ent, node); 571 - parent = *new; 572 - cmp = cache_ent_key_cmp(cur->rb_key, ent->rb_key); 573 - if (cmp > 0) 574 - new = &((*new)->rb_left); 575 - if (cmp < 0) 576 - new = &((*new)->rb_right); 577 - if (cmp == 0) 578 - return -EEXIST; 579 - } 580 - 581 - /* Add new node and rebalance tree. */ 582 - rb_link_node(&ent->node, parent, new); 583 - rb_insert_color(&ent->node, &cache->rb_root); 584 - 585 - return 0; 586 - } 587 - 588 - static struct mlx5_cache_ent * 589 - mkey_cache_ent_from_rb_key(struct mlx5_ib_dev *dev, 590 - struct mlx5r_cache_rb_key rb_key) 591 - { 592 - struct rb_node *node = dev->cache.rb_root.rb_node; 593 - struct mlx5_cache_ent *cur, *smallest = NULL; 594 - u64 ndescs_limit; 595 - int cmp; 596 - 597 - /* 598 - * Find the smallest ent with order >= requested_order. 599 - */ 600 - while (node) { 601 - cur = rb_entry(node, struct mlx5_cache_ent, node); 602 - cmp = cache_ent_key_cmp(cur->rb_key, rb_key); 603 - if (cmp > 0) { 604 - smallest = cur; 605 - node = node->rb_left; 606 - } 607 - if (cmp < 0) 608 - node = node->rb_right; 609 - if (cmp == 0) 610 - return cur; 611 - } 612 - 613 - /* 614 - * Limit the usage of mkeys larger than twice the required size while 615 - * also allowing the usage of smallest cache entry for small MRs. 616 - */ 617 - ndescs_limit = max_t(u64, rb_key.ndescs * 2, 618 - MLX5_MR_CACHE_PERSISTENT_ENTRY_MIN_DESCS); 619 - 620 - return (smallest && 621 - smallest->rb_key.access_mode == rb_key.access_mode && 622 - smallest->rb_key.access_flags == rb_key.access_flags && 623 - smallest->rb_key.ats == rb_key.ats && 624 - smallest->rb_key.st_index == rb_key.st_index && 625 - smallest->rb_key.ph == rb_key.ph && 626 - smallest->rb_key.ndescs <= ndescs_limit) ? 627 - smallest : 628 - NULL; 629 - } 630 - 631 - static struct mlx5_ib_mr *_mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev, 632 - struct mlx5_cache_ent *ent) 633 - { 634 - struct mlx5_ib_mr *mr; 635 - int err; 636 - 637 - mr = kzalloc_obj(*mr); 638 - if (!mr) 639 - return ERR_PTR(-ENOMEM); 640 - 641 - spin_lock_irq(&ent->mkeys_queue.lock); 642 - ent->in_use++; 643 - 644 - if (!ent->mkeys_queue.ci) { 645 - queue_adjust_cache_locked(ent); 646 - ent->miss++; 647 - spin_unlock_irq(&ent->mkeys_queue.lock); 648 - err = create_cache_mkey(ent, &mr->mmkey.key); 649 - if (err) { 650 - spin_lock_irq(&ent->mkeys_queue.lock); 651 - ent->in_use--; 652 - spin_unlock_irq(&ent->mkeys_queue.lock); 653 - kfree(mr); 654 - return ERR_PTR(err); 655 - } 656 - } else { 657 - mr->mmkey.key = pop_mkey_locked(ent); 658 - queue_adjust_cache_locked(ent); 659 - spin_unlock_irq(&ent->mkeys_queue.lock); 660 - } 661 - mr->mmkey.cache_ent = ent; 662 - mr->mmkey.type = MLX5_MKEY_MR; 663 - mr->mmkey.rb_key = ent->rb_key; 664 - mr->mmkey.cacheable = true; 665 - init_waitqueue_head(&mr->mmkey.wait); 666 - return mr; 667 242 } 668 243 669 244 static int get_unchangeable_access_flags(struct mlx5_ib_dev *dev, ··· 159 796 return ret; 160 797 } 161 798 799 + #define MLX5_FRMR_POOLS_KEY_ACCESS_MODE_KSM_MASK 1ULL 800 + #define MLX5_FRMR_POOLS_KEY_VENDOR_KEY_SUPPORTED \ 801 + MLX5_FRMR_POOLS_KEY_ACCESS_MODE_KSM_MASK 802 + 803 + #define MLX5_FRMR_POOLS_KERNEL_KEY_PH_SHIFT 16 804 + #define MLX5_FRMR_POOLS_KERNEL_KEY_PH_MASK 0xFF0000 805 + #define MLX5_FRMR_POOLS_KERNEL_KEY_ST_INDEX_MASK 0xFFFF 806 + 807 + static struct mlx5_ib_mr * 808 + _mlx5_frmr_pool_alloc(struct mlx5_ib_dev *dev, struct ib_umem *umem, 809 + int access_flags, int access_mode, 810 + unsigned long page_size, u16 st_index, u8 ph) 811 + { 812 + struct mlx5_ib_mr *mr; 813 + int err; 814 + 815 + mr = kzalloc_obj(*mr); 816 + if (!mr) 817 + return ERR_PTR(-ENOMEM); 818 + 819 + mr->ibmr.frmr.key.ats = mlx5_umem_needs_ats(dev, umem, access_flags); 820 + mr->ibmr.frmr.key.access_flags = 821 + get_unchangeable_access_flags(dev, access_flags); 822 + mr->ibmr.frmr.key.num_dma_blocks = 823 + ib_umem_num_dma_blocks(umem, page_size); 824 + mr->ibmr.frmr.key.vendor_key = 825 + access_mode == MLX5_MKC_ACCESS_MODE_KSM ? 826 + MLX5_FRMR_POOLS_KEY_ACCESS_MODE_KSM_MASK : 827 + 0; 828 + 829 + /* Normalize ph: swap 0 and MLX5_IB_NO_PH */ 830 + if (ph == MLX5_IB_NO_PH || ph == 0) 831 + ph ^= MLX5_IB_NO_PH; 832 + 833 + mr->ibmr.frmr.key.kernel_vendor_key = 834 + st_index | (ph << MLX5_FRMR_POOLS_KERNEL_KEY_PH_SHIFT); 835 + err = ib_frmr_pool_pop(&dev->ib_dev, &mr->ibmr); 836 + if (err) { 837 + kfree(mr); 838 + return ERR_PTR(err); 839 + } 840 + mr->mmkey.key = mr->ibmr.frmr.handle; 841 + init_waitqueue_head(&mr->mmkey.wait); 842 + 843 + return mr; 844 + } 845 + 162 846 struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev, 163 847 int access_flags, int access_mode, 164 848 int ndescs) 165 849 { 166 - struct mlx5r_cache_rb_key rb_key = { 167 - .ndescs = ndescs, 168 - .access_mode = access_mode, 169 - .access_flags = get_unchangeable_access_flags(dev, access_flags), 170 - .ph = MLX5_IB_NO_PH, 850 + struct ib_frmr_key key = { 851 + .access_flags = 852 + get_unchangeable_access_flags(dev, access_flags), 853 + .vendor_key = access_mode == MLX5_MKC_ACCESS_MODE_MTT ? 854 + 0 : 855 + MLX5_FRMR_POOLS_KEY_ACCESS_MODE_KSM_MASK, 856 + .num_dma_blocks = ndescs, 857 + .kernel_vendor_key = 0, /* no PH and no ST index */ 171 858 }; 172 - struct mlx5_cache_ent *ent = mkey_cache_ent_from_rb_key(dev, rb_key); 173 - 174 - if (!ent) 175 - return ERR_PTR(-EOPNOTSUPP); 176 - 177 - return _mlx5_mr_cache_alloc(dev, ent); 178 - } 179 - 180 - static void mlx5_mkey_cache_debugfs_cleanup(struct mlx5_ib_dev *dev) 181 - { 182 - if (!mlx5_debugfs_root || dev->is_rep) 183 - return; 184 - 185 - debugfs_remove_recursive(dev->cache.fs_root); 186 - dev->cache.fs_root = NULL; 187 - } 188 - 189 - static void mlx5_mkey_cache_debugfs_add_ent(struct mlx5_ib_dev *dev, 190 - struct mlx5_cache_ent *ent) 191 - { 192 - int order = order_base_2(ent->rb_key.ndescs); 193 - struct dentry *dir; 194 - 195 - if (!mlx5_debugfs_root || dev->is_rep) 196 - return; 197 - 198 - if (ent->rb_key.access_mode == MLX5_MKC_ACCESS_MODE_KSM) 199 - order = MLX5_IMR_KSM_CACHE_ENTRY + 2; 200 - 201 - sprintf(ent->name, "%d", order); 202 - dir = debugfs_create_dir(ent->name, dev->cache.fs_root); 203 - debugfs_create_file("size", 0600, dir, ent, &size_fops); 204 - debugfs_create_file("limit", 0600, dir, ent, &limit_fops); 205 - debugfs_create_ulong("cur", 0400, dir, &ent->mkeys_queue.ci); 206 - debugfs_create_u32("miss", 0600, dir, &ent->miss); 207 - } 208 - 209 - static void mlx5_mkey_cache_debugfs_init(struct mlx5_ib_dev *dev) 210 - { 211 - struct dentry *dbg_root = mlx5_debugfs_get_dev_root(dev->mdev); 212 - struct mlx5_mkey_cache *cache = &dev->cache; 213 - 214 - if (!mlx5_debugfs_root || dev->is_rep) 215 - return; 216 - 217 - cache->fs_root = debugfs_create_dir("mr_cache", dbg_root); 218 - } 219 - 220 - static void delay_time_func(struct timer_list *t) 221 - { 222 - struct mlx5_ib_dev *dev = timer_container_of(dev, t, delay_timer); 223 - 224 - WRITE_ONCE(dev->fill_delay, 0); 225 - } 226 - 227 - static int mlx5r_mkeys_init(struct mlx5_cache_ent *ent) 228 - { 229 - struct mlx5_mkeys_page *page; 230 - 231 - page = kzalloc_obj(*page); 232 - if (!page) 233 - return -ENOMEM; 234 - INIT_LIST_HEAD(&ent->mkeys_queue.pages_list); 235 - spin_lock_init(&ent->mkeys_queue.lock); 236 - list_add_tail(&page->list, &ent->mkeys_queue.pages_list); 237 - ent->mkeys_queue.num_pages++; 238 - return 0; 239 - } 240 - 241 - static void mlx5r_mkeys_uninit(struct mlx5_cache_ent *ent) 242 - { 243 - struct mlx5_mkeys_page *page; 244 - 245 - WARN_ON(ent->mkeys_queue.ci || ent->mkeys_queue.num_pages > 1); 246 - page = list_last_entry(&ent->mkeys_queue.pages_list, 247 - struct mlx5_mkeys_page, list); 248 - list_del(&page->list); 249 - kfree(page); 250 - } 251 - 252 - struct mlx5_cache_ent * 253 - mlx5r_cache_create_ent_locked(struct mlx5_ib_dev *dev, 254 - struct mlx5r_cache_rb_key rb_key, 255 - bool persistent_entry) 256 - { 257 - struct mlx5_cache_ent *ent; 258 - int order; 859 + struct mlx5_ib_mr *mr; 259 860 int ret; 260 861 261 - ent = kzalloc_obj(*ent); 262 - if (!ent) 862 + mr = kzalloc_obj(*mr); 863 + if (!mr) 263 864 return ERR_PTR(-ENOMEM); 264 865 265 - ret = mlx5r_mkeys_init(ent); 266 - if (ret) 267 - goto mkeys_err; 268 - ent->rb_key = rb_key; 269 - ent->dev = dev; 270 - ent->is_tmp = !persistent_entry; 866 + init_waitqueue_head(&mr->mmkey.wait); 271 867 272 - INIT_DELAYED_WORK(&ent->dwork, delayed_cache_work_func); 273 - 274 - ret = mlx5_cache_ent_insert(&dev->cache, ent); 275 - if (ret) 276 - goto ent_insert_err; 277 - 278 - if (persistent_entry) { 279 - if (rb_key.access_mode == MLX5_MKC_ACCESS_MODE_KSM) 280 - order = MLX5_IMR_KSM_CACHE_ENTRY; 281 - else 282 - order = order_base_2(rb_key.ndescs) - 2; 283 - 284 - if ((dev->mdev->profile.mask & MLX5_PROF_MASK_MR_CACHE) && 285 - !dev->is_rep && mlx5_core_is_pf(dev->mdev) && 286 - mlx5r_umr_can_load_pas(dev, 0)) 287 - ent->limit = dev->mdev->profile.mr_cache[order].limit; 288 - else 289 - ent->limit = 0; 290 - 291 - mlx5_mkey_cache_debugfs_add_ent(dev, ent); 868 + mr->ibmr.frmr.key = key; 869 + ret = ib_frmr_pool_pop(&dev->ib_dev, &mr->ibmr); 870 + if (ret) { 871 + kfree(mr); 872 + return ERR_PTR(ret); 292 873 } 874 + mr->mmkey.key = mr->ibmr.frmr.handle; 875 + mr->mmkey.type = MLX5_MKEY_MR; 293 876 294 - return ent; 295 - ent_insert_err: 296 - mlx5r_mkeys_uninit(ent); 297 - mkeys_err: 298 - kfree(ent); 299 - return ERR_PTR(ret); 877 + return mr; 300 878 } 301 879 302 - static void mlx5r_destroy_cache_entries(struct mlx5_ib_dev *dev) 880 + static int mlx5r_create_mkeys(struct ib_device *device, struct ib_frmr_key *key, 881 + u32 *handles, unsigned int count) 303 882 { 304 - struct rb_root *root = &dev->cache.rb_root; 305 - struct mlx5_cache_ent *ent; 306 - struct rb_node *node; 883 + int access_mode = 884 + key->vendor_key & MLX5_FRMR_POOLS_KEY_ACCESS_MODE_KSM_MASK ? 885 + MLX5_MKC_ACCESS_MODE_KSM : 886 + MLX5_MKC_ACCESS_MODE_MTT; 307 887 308 - mutex_lock(&dev->cache.rb_lock); 309 - node = rb_first(root); 310 - while (node) { 311 - ent = rb_entry(node, struct mlx5_cache_ent, node); 312 - node = rb_next(node); 313 - clean_keys(dev, ent); 314 - rb_erase(&ent->node, root); 315 - mlx5r_mkeys_uninit(ent); 316 - kfree(ent); 317 - } 318 - mutex_unlock(&dev->cache.rb_lock); 319 - } 888 + struct mlx5_ib_dev *dev = to_mdev(device); 889 + size_t inlen = MLX5_ST_SZ_BYTES(create_mkey_in); 890 + u16 st_index; 891 + void *mkc; 892 + u32 *in; 893 + int err, i; 894 + u8 ph; 320 895 321 - int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev) 322 - { 323 - struct mlx5_mkey_cache *cache = &dev->cache; 324 - struct rb_root *root = &dev->cache.rb_root; 325 - struct mlx5r_cache_rb_key rb_key = { 326 - .access_mode = MLX5_MKC_ACCESS_MODE_MTT, 327 - .ph = MLX5_IB_NO_PH, 328 - }; 329 - struct mlx5_cache_ent *ent; 330 - struct rb_node *node; 331 - int ret; 332 - int i; 333 - 334 - mutex_init(&dev->slow_path_mutex); 335 - mutex_init(&dev->cache.rb_lock); 336 - dev->cache.rb_root = RB_ROOT; 337 - cache->wq = alloc_ordered_workqueue("mkey_cache", WQ_MEM_RECLAIM); 338 - if (!cache->wq) { 339 - mlx5_ib_warn(dev, "failed to create work queue\n"); 896 + in = kzalloc(inlen, GFP_KERNEL); 897 + if (!in) 340 898 return -ENOMEM; 899 + mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry); 900 + 901 + set_mkc_access_pd_addr_fields(mkc, key->access_flags, 0, dev->umrc.pd); 902 + MLX5_SET(mkc, mkc, free, 1); 903 + MLX5_SET(mkc, mkc, umr_en, 1); 904 + MLX5_SET(mkc, mkc, access_mode_1_0, access_mode & 0x3); 905 + MLX5_SET(mkc, mkc, access_mode_4_2, (access_mode >> 2) & 0x7); 906 + MLX5_SET(mkc, mkc, ma_translation_mode, !!key->ats); 907 + MLX5_SET(mkc, mkc, translations_octword_size, 908 + get_mkc_octo_size(access_mode, key->num_dma_blocks)); 909 + MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT); 910 + 911 + st_index = key->kernel_vendor_key & 912 + MLX5_FRMR_POOLS_KERNEL_KEY_ST_INDEX_MASK; 913 + ph = key->kernel_vendor_key & MLX5_FRMR_POOLS_KERNEL_KEY_PH_MASK; 914 + if (ph) { 915 + /* Normalize ph: swap MLX5_IB_NO_PH for 0 */ 916 + if (ph == MLX5_IB_NO_PH) 917 + ph = 0; 918 + MLX5_SET(mkc, mkc, pcie_tph_en, 1); 919 + MLX5_SET(mkc, mkc, pcie_tph_ph, ph); 920 + if (st_index != MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX) 921 + MLX5_SET(mkc, mkc, pcie_tph_steering_tag_index, 922 + st_index); 341 923 } 342 924 343 - mlx5_cmd_init_async_ctx(dev->mdev, &dev->async_ctx); 344 - timer_setup(&dev->delay_timer, delay_time_func, 0); 345 - mlx5_mkey_cache_debugfs_init(dev); 346 - mutex_lock(&cache->rb_lock); 347 - for (i = 0; i <= mkey_cache_max_order(dev); i++) { 348 - rb_key.ndescs = MLX5_MR_CACHE_PERSISTENT_ENTRY_MIN_DESCS << i; 349 - ent = mlx5r_cache_create_ent_locked(dev, rb_key, true); 350 - if (IS_ERR(ent)) { 351 - ret = PTR_ERR(ent); 352 - goto err; 353 - } 925 + for (i = 0; i < count; i++) { 926 + assign_mkey_variant(dev, handles + i, in); 927 + err = mlx5_core_create_mkey(dev->mdev, handles + i, in, inlen); 928 + if (err) 929 + goto free_in; 354 930 } 931 + free_in: 932 + kfree(in); 933 + if (err) 934 + for (; i > 0; i--) 935 + mlx5_core_destroy_mkey(dev->mdev, handles[i]); 936 + return err; 937 + } 355 938 356 - ret = mlx5_odp_init_mkey_cache(dev); 357 - if (ret) 358 - goto err; 939 + static void mlx5r_destroy_mkeys(struct ib_device *device, u32 *handles, 940 + unsigned int count) 941 + { 942 + struct mlx5_ib_dev *dev = to_mdev(device); 943 + int i, err; 359 944 360 - mutex_unlock(&cache->rb_lock); 361 - for (node = rb_first(root); node; node = rb_next(node)) { 362 - ent = rb_entry(node, struct mlx5_cache_ent, node); 363 - spin_lock_irq(&ent->mkeys_queue.lock); 364 - queue_adjust_cache_locked(ent); 365 - spin_unlock_irq(&ent->mkeys_queue.lock); 945 + for (i = 0; i < count; i++) { 946 + err = mlx5_core_destroy_mkey(dev->mdev, handles[i]); 947 + if (err) 948 + pr_warn_ratelimited( 949 + "mlx5_ib: failed to destroy mkey %d: %d", 950 + handles[i], err); 366 951 } 952 + } 953 + 954 + static int mlx5r_build_frmr_key(struct ib_device *device, 955 + const struct ib_frmr_key *in, 956 + struct ib_frmr_key *out) 957 + { 958 + struct mlx5_ib_dev *dev = to_mdev(device); 959 + 960 + /* check HW capabilities of users requested frmr key */ 961 + if ((in->ats && !MLX5_CAP_GEN(dev->mdev, ats)) || 962 + ilog2(in->num_dma_blocks) > mkey_max_umr_order(dev)) 963 + return -EOPNOTSUPP; 964 + 965 + if (in->vendor_key & ~MLX5_FRMR_POOLS_KEY_VENDOR_KEY_SUPPORTED) 966 + return -EOPNOTSUPP; 967 + 968 + out->ats = in->ats; 969 + out->access_flags = 970 + get_unchangeable_access_flags(dev, in->access_flags); 971 + out->vendor_key = in->vendor_key; 972 + out->num_dma_blocks = in->num_dma_blocks; 367 973 368 974 return 0; 369 - 370 - err: 371 - mutex_unlock(&cache->rb_lock); 372 - mlx5_mkey_cache_debugfs_cleanup(dev); 373 - mlx5r_destroy_cache_entries(dev); 374 - destroy_workqueue(cache->wq); 375 - mlx5_ib_warn(dev, "failed to create mkey cache entry\n"); 376 - return ret; 377 975 } 378 976 379 - void mlx5_mkey_cache_cleanup(struct mlx5_ib_dev *dev) 977 + static struct ib_frmr_pool_ops mlx5r_frmr_pool_ops = { 978 + .create_frmrs = mlx5r_create_mkeys, 979 + .destroy_frmrs = mlx5r_destroy_mkeys, 980 + .build_key = mlx5r_build_frmr_key, 981 + }; 982 + 983 + int mlx5r_frmr_pools_init(struct ib_device *device) 380 984 { 381 - struct rb_root *root = &dev->cache.rb_root; 382 - struct mlx5_cache_ent *ent; 383 - struct rb_node *node; 985 + struct mlx5_ib_dev *dev = to_mdev(device); 384 986 385 - if (!dev->cache.wq) 386 - return; 987 + mutex_init(&dev->slow_path_mutex); 988 + return ib_frmr_pools_init(device, &mlx5r_frmr_pool_ops); 989 + } 387 990 388 - mutex_lock(&dev->cache.rb_lock); 389 - for (node = rb_first(root); node; node = rb_next(node)) { 390 - ent = rb_entry(node, struct mlx5_cache_ent, node); 391 - spin_lock_irq(&ent->mkeys_queue.lock); 392 - ent->disabled = true; 393 - spin_unlock_irq(&ent->mkeys_queue.lock); 394 - cancel_delayed_work(&ent->dwork); 395 - } 396 - mutex_unlock(&dev->cache.rb_lock); 397 - 398 - /* 399 - * After all entries are disabled and will not reschedule on WQ, 400 - * flush it and all async commands. 401 - */ 402 - flush_workqueue(dev->cache.wq); 403 - 404 - mlx5_mkey_cache_debugfs_cleanup(dev); 405 - mlx5_cmd_cleanup_async_ctx(&dev->async_ctx); 406 - 407 - /* At this point all entries are disabled and have no concurrent work. */ 408 - mlx5r_destroy_cache_entries(dev); 409 - 410 - destroy_workqueue(dev->cache.wq); 411 - timer_delete_sync(&dev->delay_timer); 991 + void mlx5r_frmr_pools_cleanup(struct ib_device *device) 992 + { 993 + ib_frmr_pools_cleanup(device); 412 994 } 413 995 414 996 struct ib_mr *mlx5_ib_get_dma_mr(struct ib_pd *pd, int acc) ··· 415 1107 return (npages + 1) / 2; 416 1108 } 417 1109 418 - static int mkey_cache_max_order(struct mlx5_ib_dev *dev) 419 - { 420 - if (MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset)) 421 - return MKEY_CACHE_LAST_STD_ENTRY; 422 - return MLX5_MAX_UMR_SHIFT; 423 - } 424 - 425 1110 static void set_mr_fields(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr, 426 1111 u64 length, int access_flags, u64 iova) 427 1112 { ··· 443 1142 u16 st_index, u8 ph) 444 1143 { 445 1144 struct mlx5_ib_dev *dev = to_mdev(pd->device); 446 - struct mlx5r_cache_rb_key rb_key = {}; 447 - struct mlx5_cache_ent *ent; 448 1145 struct mlx5_ib_mr *mr; 449 1146 unsigned long page_size; 450 1147 ··· 454 1155 if (WARN_ON(!page_size)) 455 1156 return ERR_PTR(-EINVAL); 456 1157 457 - rb_key.access_mode = access_mode; 458 - rb_key.ndescs = ib_umem_num_dma_blocks(umem, page_size); 459 - rb_key.ats = mlx5_umem_needs_ats(dev, umem, access_flags); 460 - rb_key.access_flags = get_unchangeable_access_flags(dev, access_flags); 461 - rb_key.st_index = st_index; 462 - rb_key.ph = ph; 463 - ent = mkey_cache_ent_from_rb_key(dev, rb_key); 464 - /* 465 - * If the MR can't come from the cache then synchronously create an uncached 466 - * one. 467 - */ 468 - if (!ent) { 469 - mutex_lock(&dev->slow_path_mutex); 470 - mr = reg_create(pd, umem, iova, access_flags, page_size, false, access_mode, 471 - st_index, ph); 472 - mutex_unlock(&dev->slow_path_mutex); 473 - if (IS_ERR(mr)) 474 - return mr; 475 - mr->mmkey.rb_key = rb_key; 476 - mr->mmkey.cacheable = true; 477 - return mr; 478 - } 479 - 480 - mr = _mlx5_mr_cache_alloc(dev, ent); 1158 + mr = _mlx5_frmr_pool_alloc(dev, umem, access_flags, access_mode, 1159 + page_size, st_index, ph); 481 1160 if (IS_ERR(mr)) 482 1161 return mr; 483 1162 1163 + mr->mmkey.type = MLX5_MKEY_MR; 484 1164 mr->ibmr.pd = pd; 485 1165 mr->umem = umem; 486 1166 mr->page_shift = order_base_2(page_size); ··· 1089 1811 unsigned long *page_size) 1090 1812 { 1091 1813 struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device); 1814 + u8 access_mode; 1092 1815 1093 - /* We only track the allocated sizes of MRs from the cache */ 1094 - if (!mr->mmkey.cache_ent) 1816 + /* We only track the allocated sizes of MRs from the frmr pools */ 1817 + if (!mr->ibmr.frmr.pool) 1095 1818 return false; 1096 1819 if (!mlx5r_umr_can_load_pas(dev, new_umem->length)) 1097 1820 return false; 1098 1821 1099 - *page_size = mlx5_umem_mkc_find_best_pgsz( 1100 - dev, new_umem, iova, mr->mmkey.cache_ent->rb_key.access_mode); 1822 + access_mode = mr->ibmr.frmr.key.vendor_key & 1823 + MLX5_FRMR_POOLS_KEY_ACCESS_MODE_KSM_MASK ? 1824 + MLX5_MKC_ACCESS_MODE_KSM : 1825 + MLX5_MKC_ACCESS_MODE_MTT; 1826 + 1827 + *page_size = 1828 + mlx5_umem_mkc_find_best_pgsz(dev, new_umem, iova, access_mode); 1101 1829 if (WARN_ON(!*page_size)) 1102 1830 return false; 1103 - return (mr->mmkey.cache_ent->rb_key.ndescs) >= 1831 + return (mr->ibmr.frmr.key.num_dma_blocks) >= 1104 1832 ib_umem_num_dma_blocks(new_umem, *page_size); 1105 1833 } 1106 1834 ··· 1167 1883 int err; 1168 1884 1169 1885 if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM) || mr->data_direct || 1170 - mr->mmkey.rb_key.ph != MLX5_IB_NO_PH) 1886 + (mr->ibmr.frmr.key.kernel_vendor_key & 1887 + MLX5_FRMR_POOLS_KERNEL_KEY_PH_MASK) != 0) 1171 1888 return ERR_PTR(-EOPNOTSUPP); 1172 1889 1173 1890 mlx5_ib_dbg( ··· 1309 2024 } 1310 2025 } 1311 2026 1312 - static int cache_ent_find_and_store(struct mlx5_ib_dev *dev, 1313 - struct mlx5_ib_mr *mr) 1314 - { 1315 - struct mlx5_mkey_cache *cache = &dev->cache; 1316 - struct mlx5_cache_ent *ent; 1317 - int ret; 1318 - 1319 - if (mr->mmkey.cache_ent) { 1320 - spin_lock_irq(&mr->mmkey.cache_ent->mkeys_queue.lock); 1321 - goto end; 1322 - } 1323 - 1324 - mutex_lock(&cache->rb_lock); 1325 - ent = mkey_cache_ent_from_rb_key(dev, mr->mmkey.rb_key); 1326 - if (ent) { 1327 - if (ent->rb_key.ndescs == mr->mmkey.rb_key.ndescs) { 1328 - if (ent->disabled) { 1329 - mutex_unlock(&cache->rb_lock); 1330 - return -EOPNOTSUPP; 1331 - } 1332 - mr->mmkey.cache_ent = ent; 1333 - spin_lock_irq(&mr->mmkey.cache_ent->mkeys_queue.lock); 1334 - mutex_unlock(&cache->rb_lock); 1335 - goto end; 1336 - } 1337 - } 1338 - 1339 - ent = mlx5r_cache_create_ent_locked(dev, mr->mmkey.rb_key, false); 1340 - mutex_unlock(&cache->rb_lock); 1341 - if (IS_ERR(ent)) 1342 - return PTR_ERR(ent); 1343 - 1344 - mr->mmkey.cache_ent = ent; 1345 - spin_lock_irq(&mr->mmkey.cache_ent->mkeys_queue.lock); 1346 - 1347 - end: 1348 - ret = push_mkey_locked(mr->mmkey.cache_ent, mr->mmkey.key); 1349 - spin_unlock_irq(&mr->mmkey.cache_ent->mkeys_queue.lock); 1350 - return ret; 1351 - } 1352 - 1353 2027 static int mlx5_ib_revoke_data_direct_mr(struct mlx5_ib_mr *mr) 1354 2028 { 1355 2029 struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device); ··· 1374 2130 bool is_odp_dma_buf = is_dmabuf_mr(mr) && 1375 2131 !to_ib_umem_dmabuf(mr->umem)->pinned; 1376 2132 struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device); 1377 - struct mlx5_cache_ent *ent = mr->mmkey.cache_ent; 1378 2133 bool is_odp = is_odp_mr(mr); 1379 - bool from_cache = !!ent; 1380 2134 int ret; 1381 2135 1382 - if (mr->mmkey.cacheable && !mlx5_umr_revoke_mr_with_lock(mr) && 1383 - !cache_ent_find_and_store(dev, mr)) { 1384 - ent = mr->mmkey.cache_ent; 1385 - /* upon storing to a clean temp entry - schedule its cleanup */ 1386 - spin_lock_irq(&ent->mkeys_queue.lock); 1387 - if (from_cache) 1388 - ent->in_use--; 1389 - if (ent->is_tmp && !ent->tmp_cleanup_scheduled) { 1390 - mod_delayed_work(ent->dev->cache.wq, &ent->dwork, 1391 - secs_to_jiffies(30)); 1392 - ent->tmp_cleanup_scheduled = true; 1393 - } 1394 - spin_unlock_irq(&ent->mkeys_queue.lock); 2136 + if (mr->ibmr.frmr.pool && !mlx5_umr_revoke_mr_with_lock(mr) && 2137 + !ib_frmr_pool_push(mr->ibmr.device, &mr->ibmr)) 1395 2138 return 0; 1396 - } 1397 - 1398 - if (ent) { 1399 - spin_lock_irq(&ent->mkeys_queue.lock); 1400 - ent->in_use--; 1401 - mr->mmkey.cache_ent = NULL; 1402 - spin_unlock_irq(&ent->mkeys_queue.lock); 1403 - } 1404 2139 1405 2140 if (is_odp) 1406 2141 mutex_lock(&to_ib_umem_odp(mr->umem)->umem_mutex); ··· 1463 2240 mlx5_ib_free_odp_mr(mr); 1464 2241 } 1465 2242 1466 - if (!mr->mmkey.cache_ent) 2243 + if (!mr->ibmr.frmr.pool) 1467 2244 mlx5_free_priv_descs(mr); 1468 2245 1469 2246 kfree(mr); ··· 1772 2549 __u32 response_length; 1773 2550 } resp = {}; 1774 2551 1775 - err = ib_copy_from_udata(&req, udata, min(udata->inlen, sizeof(req))); 1776 - if (err) 1777 - return err; 2552 + if (udata->inlen) { 2553 + err = ib_copy_validate_udata_in_cm(udata, req, reserved2, 0); 2554 + if (err) 2555 + return err; 2556 + } 1778 2557 1779 - if (req.comp_mask || req.reserved1 || req.reserved2) 1780 - return -EOPNOTSUPP; 1781 - 1782 - if (udata->inlen > sizeof(req) && 1783 - !ib_is_udata_cleared(udata, sizeof(req), 1784 - udata->inlen - sizeof(req))) 2558 + if (req.reserved1 || req.reserved2) 1785 2559 return -EOPNOTSUPP; 1786 2560 1787 2561 ndescs = req.num_klms ? roundup(req.num_klms, 4) : roundup(1, 4);

-19

drivers/infiniband/hw/mlx5/odp.c

··· 1875 1875 return err; 1876 1876 } 1877 1877 1878 - int mlx5_odp_init_mkey_cache(struct mlx5_ib_dev *dev) 1879 - { 1880 - struct mlx5r_cache_rb_key rb_key = { 1881 - .access_mode = MLX5_MKC_ACCESS_MODE_KSM, 1882 - .ndescs = mlx5_imr_ksm_entries, 1883 - .ph = MLX5_IB_NO_PH, 1884 - }; 1885 - struct mlx5_cache_ent *ent; 1886 - 1887 - if (!(dev->odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT)) 1888 - return 0; 1889 - 1890 - ent = mlx5r_cache_create_ent_locked(dev, rb_key, true); 1891 - if (IS_ERR(ent)) 1892 - return PTR_ERR(ent); 1893 - 1894 - return 0; 1895 - } 1896 - 1897 1878 static const struct ib_device_ops mlx5_ib_dev_odp_ops = { 1898 1879 .advise_mr = mlx5_ib_advise_mr, 1899 1880 };

+1 -1

drivers/infiniband/hw/mlx5/qos.c

··· 45 45 return -EINVAL; 46 46 47 47 dev = to_mdev(c->ibucontext.device); 48 - pp_entry = kzalloc(sizeof(*pp_entry), GFP_KERNEL); 48 + pp_entry = kzalloc_obj(*pp_entry); 49 49 if (!pp_entry) 50 50 return -ENOMEM; 51 51

+18 -50

drivers/infiniband/hw/mlx5/qp.c

··· 1273 1273 } 1274 1274 return MLX5_TIMESTAMP_FORMAT_REAL_TIME; 1275 1275 } 1276 - if (cq->create_flags & IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION) { 1276 + if (cq->private_flags & MLX5_IB_CQ_PR_TIMESTAMP_COMPLETION) { 1277 1277 if (!fr_sup) { 1278 1278 mlx5_ib_dbg(dev, 1279 1279 "Free running TS format is not supported\n"); ··· 4692 4692 struct mlx5_ib_dev *dev = to_mdev(ibqp->device); 4693 4693 struct mlx5_ib_modify_qp_resp resp = {}; 4694 4694 struct mlx5_ib_qp *qp = to_mqp(ibqp); 4695 - struct mlx5_ib_modify_qp ucmd = {}; 4695 + struct mlx5_ib_modify_qp ucmd; 4696 4696 enum ib_qp_type qp_type; 4697 4697 enum ib_qp_state cur_state, new_state; 4698 4698 int err = -EINVAL; ··· 4707 4707 return -ENOSYS; 4708 4708 4709 4709 if (udata && udata->inlen) { 4710 - if (udata->inlen < offsetofend(typeof(ucmd), ece_options)) 4711 - return -EINVAL; 4710 + err = ib_copy_validate_udata_in_cm(udata, ucmd, ece_options, 4711 + MLX5_IB_MODIFY_QP_OOO_DP); 4712 + if (err) 4713 + return err; 4712 4714 4713 - if (udata->inlen > sizeof(ucmd) && 4714 - !ib_is_udata_cleared(udata, sizeof(ucmd), 4715 - udata->inlen - sizeof(ucmd))) 4716 - return -EOPNOTSUPP; 4717 - 4718 - if (ib_copy_from_udata(&ucmd, udata, 4719 - min(udata->inlen, sizeof(ucmd)))) 4720 - return -EFAULT; 4721 - 4722 - if (ucmd.comp_mask & ~MLX5_IB_MODIFY_QP_OOO_DP || 4723 - memchr_inv(&ucmd.burst_info.reserved, 0, 4715 + if (memchr_inv(&ucmd.burst_info.reserved, 0, 4724 4716 sizeof(ucmd.burst_info.reserved))) 4725 4717 return -EOPNOTSUPP; 4726 4718 ··· 5379 5387 struct mlx5_ib_rwq *rwq) 5380 5388 { 5381 5389 struct mlx5_ib_dev *dev = to_mdev(pd->device); 5382 - struct mlx5_ib_create_wq ucmd = {}; 5390 + struct mlx5_ib_create_wq ucmd; 5383 5391 int err; 5384 - size_t required_cmd_sz; 5385 5392 5386 - required_cmd_sz = offsetofend(struct mlx5_ib_create_wq, 5387 - single_stride_log_num_of_bytes); 5388 - if (udata->inlen < required_cmd_sz) { 5389 - mlx5_ib_dbg(dev, "invalid inlen\n"); 5390 - return -EINVAL; 5391 - } 5392 - 5393 - if (udata->inlen > sizeof(ucmd) && 5394 - !ib_is_udata_cleared(udata, sizeof(ucmd), 5395 - udata->inlen - sizeof(ucmd))) { 5396 - mlx5_ib_dbg(dev, "inlen is not supported\n"); 5397 - return -EOPNOTSUPP; 5398 - } 5399 - 5400 - if (ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen))) { 5393 + err = ib_copy_validate_udata_in_cm(udata, ucmd, 5394 + single_stride_log_num_of_bytes, 5395 + MLX5_IB_CREATE_WQ_STRIDING_RQ); 5396 + if (err) { 5401 5397 mlx5_ib_dbg(dev, "copy failed\n"); 5402 - return -EFAULT; 5398 + return err; 5403 5399 } 5404 5400 5405 - if (ucmd.comp_mask & (~MLX5_IB_CREATE_WQ_STRIDING_RQ)) { 5406 - mlx5_ib_dbg(dev, "invalid comp mask\n"); 5407 - return -EOPNOTSUPP; 5408 - } else if (ucmd.comp_mask & MLX5_IB_CREATE_WQ_STRIDING_RQ) { 5401 + if (ucmd.comp_mask & MLX5_IB_CREATE_WQ_STRIDING_RQ) { 5409 5402 if (!MLX5_CAP_GEN(dev->mdev, striding_rq)) { 5410 5403 mlx5_ib_dbg(dev, "Striding RQ is not supported\n"); 5411 5404 return -EOPNOTSUPP; ··· 5603 5626 struct mlx5_ib_dev *dev = to_mdev(wq->device); 5604 5627 struct mlx5_ib_rwq *rwq = to_mrwq(wq); 5605 5628 struct mlx5_ib_modify_wq ucmd = {}; 5606 - size_t required_cmd_sz; 5607 5629 int curr_wq_state; 5608 5630 int wq_state; 5609 5631 int inlen; ··· 5610 5634 void *rqc; 5611 5635 void *in; 5612 5636 5613 - required_cmd_sz = offsetofend(struct mlx5_ib_modify_wq, reserved); 5614 - if (udata->inlen < required_cmd_sz) 5615 - return -EINVAL; 5637 + err = ib_copy_validate_udata_in_cm(udata, ucmd, reserved, 0); 5638 + if (err) 5639 + return err; 5616 5640 5617 - if (udata->inlen > sizeof(ucmd) && 5618 - !ib_is_udata_cleared(udata, sizeof(ucmd), 5619 - udata->inlen - sizeof(ucmd))) 5620 - return -EOPNOTSUPP; 5621 - 5622 - if (ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen))) 5623 - return -EFAULT; 5624 - 5625 - if (ucmd.comp_mask || ucmd.reserved) 5641 + if (ucmd.reserved) 5626 5642 return -EOPNOTSUPP; 5627 5643 5628 5644 inlen = MLX5_ST_SZ_BYTES(modify_rq_in);

+4 -13

drivers/infiniband/hw/mlx5/srq.c

··· 45 45 struct ib_udata *udata, int buf_size) 46 46 { 47 47 struct mlx5_ib_dev *dev = to_mdev(pd->device); 48 - struct mlx5_ib_create_srq ucmd = {}; 48 + struct mlx5_ib_create_srq ucmd; 49 49 struct mlx5_ib_ucontext *ucontext = rdma_udata_to_drv_context( 50 50 udata, struct mlx5_ib_ucontext, ibucontext); 51 - size_t ucmdlen; 52 51 int err; 53 52 u32 uidx = MLX5_IB_DEFAULT_UIDX; 54 53 55 - ucmdlen = min(udata->inlen, sizeof(ucmd)); 56 - 57 - if (ib_copy_from_udata(&ucmd, udata, ucmdlen)) { 58 - mlx5_ib_dbg(dev, "failed copy udata\n"); 59 - return -EFAULT; 60 - } 54 + err = ib_copy_validate_udata_in(udata, ucmd, flags); 55 + if (err) 56 + return err; 61 57 62 58 if (ucmd.reserved0 || ucmd.reserved1) 63 - return -EINVAL; 64 - 65 - if (udata->inlen > sizeof(ucmd) && 66 - !ib_is_udata_cleared(udata, sizeof(ucmd), 67 - udata->inlen - sizeof(ucmd))) 68 59 return -EINVAL; 69 60 70 61 if (in->type != IB_SRQT_BASIC) {

+1

drivers/infiniband/hw/mlx5/umr.c

+1

drivers/infiniband/hw/mlx5/umr.h

··· 9 9 10 10 #define MLX5_MAX_UMR_SHIFT 16 11 11 #define MLX5_MAX_UMR_PAGES (1 << MLX5_MAX_UMR_SHIFT) 12 + #define MLX5_MAX_UMR_EXTENDED_SHIFT 43 12 13 13 14 #define MLX5_IB_UMR_OCTOWORD 16 14 15 #define MLX5_IB_UMR_XLT_ALIGNMENT 64

+21 -15

drivers/infiniband/hw/mthca/mthca_provider.c

··· 35 35 */ 36 36 37 37 #include <rdma/ib_smi.h> 38 - #include <rdma/ib_umem.h> 39 38 #include <rdma/ib_user_verbs.h> 39 + #include <rdma/iter.h> 40 40 #include <rdma/uverbs_ioctl.h> 41 41 42 42 #include <linux/sched.h> ··· 402 402 return -EOPNOTSUPP; 403 403 404 404 if (udata) { 405 - if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) 406 - return -EFAULT; 405 + err = ib_copy_validate_udata_in(udata, ucmd, db_page); 406 + if (err) 407 + return err; 407 408 408 409 err = mthca_map_user_db(to_mdev(ibsrq->device), &context->uar, 409 410 context->db_tab, ucmd.db_index, ··· 473 472 case IB_QPT_UD: 474 473 { 475 474 if (udata) { 476 - if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) 477 - return -EFAULT; 475 + err = ib_copy_validate_udata_in(udata, ucmd, rq_db_index); 476 + if (err) 477 + return err; 478 478 479 479 err = mthca_map_user_db(dev, &context->uar, 480 480 context->db_tab, ··· 596 594 return -EINVAL; 597 595 598 596 if (udata) { 599 - if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) 600 - return -EFAULT; 597 + err = ib_copy_validate_udata_in(udata, ucmd, set_db_index); 598 + if (err) 599 + return err; 601 600 602 601 err = mthca_map_user_db(to_mdev(ibdev), &context->uar, 603 602 context->db_tab, ucmd.set_db_index, ··· 698 695 return 0; 699 696 } 700 697 701 - static int mthca_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) 698 + static int mthca_resize_cq(struct ib_cq *ibcq, unsigned int entries, 699 + struct ib_udata *udata) 702 700 { 703 701 struct mthca_dev *dev = to_mdev(ibcq->device); 704 702 struct mthca_cq *cq = to_mcq(ibcq); ··· 707 703 u32 lkey; 708 704 int ret; 709 705 710 - if (entries < 1 || entries > dev->limits.max_cqes) 706 + if (entries > dev->limits.max_cqes) 711 707 return -EINVAL; 712 708 713 709 mutex_lock(&cq->mutex); ··· 724 720 goto out; 725 721 lkey = cq->resize_buf->buf.mr.ibmr.lkey; 726 722 } else { 727 - if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) { 728 - ret = -EFAULT; 723 + ret = ib_copy_validate_udata_in(udata, ucmd, reserved); 724 + if (ret) 729 725 goto out; 730 - } 731 726 lkey = ucmd.lkey; 732 727 } 733 728 ··· 854 851 } 855 852 ++context->reg_mr_warned; 856 853 ucmd.mr_attrs = 0; 857 - } else if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) 858 - return ERR_PTR(-EFAULT); 854 + } else { 855 + err = ib_copy_validate_udata_in(udata, ucmd, reserved); 856 + if (err) 857 + return ERR_PTR(err); 858 + } 859 859 860 860 mr = kmalloc_obj(*mr); 861 861 if (!mr) ··· 1102 1096 .query_port = mthca_query_port, 1103 1097 .query_qp = mthca_query_qp, 1104 1098 .reg_user_mr = mthca_reg_user_mr, 1105 - .resize_cq = mthca_resize_cq, 1099 + .resize_user_cq = mthca_resize_cq, 1106 1100 1107 1101 INIT_RDMA_OBJ_SIZE(ib_ah, mthca_ah, ibah), 1108 1102 INIT_RDMA_OBJ_SIZE(ib_cq, mthca_cq, ibcq),

+1 -1

drivers/infiniband/hw/ocrdma/ocrdma.h

··· 190 190 struct ib_mr ibmr; 191 191 struct ib_umem *umem; 192 192 struct ocrdma_hw_mr hwmr; 193 - u64 *pages; 194 193 u32 npages; 194 + u64 pages[]; 195 195 }; 196 196 197 197 struct ocrdma_stats {

+1 -1

drivers/infiniband/hw/ocrdma/ocrdma_main.c

··· 166 166 .query_qp = ocrdma_query_qp, 167 167 .reg_user_mr = ocrdma_reg_user_mr, 168 168 .req_notify_cq = ocrdma_arm_cq, 169 - .resize_cq = ocrdma_resize_cq, 169 + .resize_user_cq = ocrdma_resize_cq, 170 170 171 171 INIT_RDMA_OBJ_SIZE(ib_ah, ocrdma_ah, ibah), 172 172 INIT_RDMA_OBJ_SIZE(ib_cq, ocrdma_cq, ibcq),

+24 -32

drivers/infiniband/hw/ocrdma/ocrdma_verbs.c

··· 45 45 #include <rdma/ib_verbs.h> 46 46 #include <rdma/ib_user_verbs.h> 47 47 #include <rdma/iw_cm.h> 48 - #include <rdma/ib_umem.h> 49 48 #include <rdma/ib_addr.h> 50 49 #include <rdma/ib_cache.h> 50 + #include <rdma/iter.h> 51 51 #include <rdma/uverbs_ioctl.h> 52 52 53 53 #include "ocrdma.h" ··· 794 794 void *va; 795 795 dma_addr_t pa; 796 796 797 - mr->pbl_table = kzalloc_objs(struct ocrdma_pbl, mr->num_pbls); 797 + mr->pbl_table = kzalloc_objs(*mr->pbl_table, mr->num_pbls); 798 798 799 799 if (!mr->pbl_table) 800 800 return -ENOMEM; ··· 910 910 911 911 (void) ocrdma_mbx_dealloc_lkey(dev, mr->hwmr.fr_mr, mr->hwmr.lkey); 912 912 913 - kfree(mr->pages); 914 913 ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr); 915 914 916 915 /* it could be user registered memory. */ ··· 982 983 return -EOPNOTSUPP; 983 984 984 985 if (udata) { 985 - if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) 986 - return -EFAULT; 986 + status = ib_copy_validate_udata_in(udata, ureq, rsvd); 987 + if (status) 988 + return status; 987 989 } else 988 990 ureq.dpp_cq = 0; 989 991 ··· 1014 1014 return status; 1015 1015 } 1016 1016 1017 - int ocrdma_resize_cq(struct ib_cq *ibcq, int new_cnt, 1017 + int ocrdma_resize_cq(struct ib_cq *ibcq, unsigned int new_cnt, 1018 1018 struct ib_udata *udata) 1019 1019 { 1020 - int status = 0; 1021 1020 struct ocrdma_cq *cq = get_ocrdma_cq(ibcq); 1022 1021 1023 - if (new_cnt < 1 || new_cnt > cq->max_hw_cqe) { 1024 - status = -EINVAL; 1025 - return status; 1026 - } 1022 + if (new_cnt > cq->max_hw_cqe) 1023 + return -EINVAL; 1024 + 1027 1025 ibcq->cqe = new_cnt; 1028 - return status; 1026 + return 0; 1029 1027 } 1030 1028 1031 1029 static void ocrdma_flush_cq(struct ocrdma_cq *cq) ··· 1251 1253 1252 1254 static int ocrdma_alloc_wr_id_tbl(struct ocrdma_qp *qp) 1253 1255 { 1254 - qp->wqe_wr_id_tbl = 1255 - kzalloc_objs(*(qp->wqe_wr_id_tbl), qp->sq.max_cnt); 1256 + qp->wqe_wr_id_tbl = kzalloc_objs(*qp->wqe_wr_id_tbl, qp->sq.max_cnt); 1256 1257 if (qp->wqe_wr_id_tbl == NULL) 1257 1258 return -ENOMEM; 1258 - qp->rqe_wr_id_tbl = 1259 - kcalloc(qp->rq.max_cnt, sizeof(u64), GFP_KERNEL); 1259 + 1260 + qp->rqe_wr_id_tbl = kzalloc_objs(*qp->rqe_wr_id_tbl, qp->rq.max_cnt); 1260 1261 if (qp->rqe_wr_id_tbl == NULL) 1261 1262 return -ENOMEM; 1262 1263 ··· 1308 1311 if (status) 1309 1312 goto gen_err; 1310 1313 1311 - memset(&ureq, 0, sizeof(ureq)); 1312 1314 if (udata) { 1313 - if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) 1314 - return -EFAULT; 1315 + status = ib_copy_validate_udata_in(udata, ureq, rsvd1); 1316 + if (status) 1317 + return status; 1318 + } else { 1319 + memset(&ureq, 0, sizeof(ureq)); 1315 1320 } 1321 + 1316 1322 ocrdma_set_qp_init_params(qp, pd, attrs); 1317 1323 if (udata == NULL) 1318 1324 qp->cap_flags |= (OCRDMA_QP_MW_BIND | OCRDMA_QP_LKEY0 | ··· 1788 1788 return status; 1789 1789 1790 1790 if (!udata) { 1791 - srq->rqe_wr_id_tbl = kcalloc(srq->rq.max_cnt, sizeof(u64), 1792 - GFP_KERNEL); 1791 + srq->rqe_wr_id_tbl = 1792 + kzalloc_objs(*srq->rqe_wr_id_tbl, srq->rq.max_cnt); 1793 1793 if (!srq->rqe_wr_id_tbl) { 1794 1794 status = -ENOMEM; 1795 1795 goto arm_err; ··· 2909 2909 if (max_num_sg > dev->attr.max_pages_per_frmr) 2910 2910 return ERR_PTR(-EINVAL); 2911 2911 2912 - mr = kzalloc_obj(*mr); 2912 + mr = kzalloc_flex(*mr, pages, max_num_sg); 2913 2913 if (!mr) 2914 2914 return ERR_PTR(-ENOMEM); 2915 2915 2916 - mr->pages = kcalloc(max_num_sg, sizeof(u64), GFP_KERNEL); 2917 - if (!mr->pages) { 2918 - status = -ENOMEM; 2919 - goto pl_err; 2920 - } 2921 - 2922 2916 status = ocrdma_get_pbl_info(dev, mr, max_num_sg); 2923 2917 if (status) 2924 - goto pbl_err; 2918 + goto pl_err; 2925 2919 mr->hwmr.fr_mr = 1; 2926 2920 mr->hwmr.remote_rd = 0; 2927 2921 mr->hwmr.remote_wr = 0; ··· 2924 2930 mr->hwmr.mw_bind = 0; 2925 2931 status = ocrdma_build_pbl_tbl(dev, &mr->hwmr); 2926 2932 if (status) 2927 - goto pbl_err; 2933 + goto pl_err; 2928 2934 status = ocrdma_reg_mr(dev, &mr->hwmr, pd->id, 0); 2929 2935 if (status) 2930 2936 goto mbx_err; ··· 2935 2941 return &mr->ibmr; 2936 2942 mbx_err: 2937 2943 ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr); 2938 - pbl_err: 2939 - kfree(mr->pages); 2940 2944 pl_err: 2941 2945 kfree(mr); 2942 2946 return ERR_PTR(-ENOMEM);

+1 -1

drivers/infiniband/hw/ocrdma/ocrdma_verbs.h

··· 71 71 72 72 int ocrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, 73 73 struct uverbs_attr_bundle *attrs); 74 - int ocrdma_resize_cq(struct ib_cq *, int cqe, struct ib_udata *); 74 + int ocrdma_resize_cq(struct ib_cq *, unsigned int cqe, struct ib_udata *); 75 75 int ocrdma_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata); 76 76 77 77 int ocrdma_create_qp(struct ib_qp *qp, struct ib_qp_init_attr *attrs,

+16 -28

drivers/infiniband/hw/qedr/verbs.c

··· 39 39 #include <rdma/ib_verbs.h> 40 40 #include <rdma/ib_user_verbs.h> 41 41 #include <rdma/iw_cm.h> 42 - #include <rdma/ib_umem.h> 43 42 #include <rdma/ib_addr.h> 44 43 #include <rdma/ib_cache.h> 44 + #include <rdma/iter.h> 45 45 #include <rdma/uverbs_ioctl.h> 46 46 47 47 #include <linux/qed/common_hsi.h> ··· 264 264 int rc; 265 265 struct qedr_ucontext *ctx = get_qedr_ucontext(uctx); 266 266 struct qedr_alloc_ucontext_resp uresp = {}; 267 - struct qedr_alloc_ucontext_req ureq = {}; 267 + struct qedr_alloc_ucontext_req ureq; 268 268 struct qedr_dev *dev = get_qedr_dev(ibdev); 269 269 struct qed_rdma_add_user_out_params oparams; 270 270 struct qedr_user_mmap_entry *entry; ··· 273 273 return -EFAULT; 274 274 275 275 if (udata->inlen) { 276 - rc = ib_copy_from_udata(&ureq, udata, 277 - min(sizeof(ureq), udata->inlen)); 278 - if (rc) { 279 - DP_ERR(dev, "Problem copying data from user space\n"); 280 - return -EFAULT; 281 - } 276 + rc = ib_copy_validate_udata_in(udata, ureq, reserved); 277 + if (rc) 278 + return rc; 282 279 ctx->edpm_mode = !!(ureq.context_flags & 283 280 QEDR_ALLOC_UCTX_EDPM_MODE); 284 281 ctx->db_rec = !!(ureq.context_flags & QEDR_ALLOC_UCTX_DB_REC); ··· 913 916 }; 914 917 struct qedr_dev *dev = get_qedr_dev(ibdev); 915 918 struct qed_rdma_create_cq_in_params params; 916 - struct qedr_create_cq_ureq ureq = {}; 919 + struct qedr_create_cq_ureq ureq; 917 920 int vector = attr->comp_vector; 918 921 int entries = attr->cqe; 919 922 struct qedr_cq *cq = get_qedr_cq(ibcq); ··· 946 949 db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT); 947 950 948 951 if (udata) { 949 - if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq), 950 - udata->inlen))) { 951 - DP_ERR(dev, 952 - "create cq: problem copying data from user space\n"); 953 - goto err0; 954 - } 952 + rc = ib_copy_validate_udata_in(udata, ureq, len); 953 + if (rc) 954 + return rc; 955 955 956 956 if (!ureq.len) { 957 957 DP_ERR(dev, ··· 1541 1547 struct qedr_dev *dev = get_qedr_dev(ibsrq->device); 1542 1548 struct qed_rdma_create_srq_out_params out_params; 1543 1549 struct qedr_pd *pd = get_qedr_pd(ibsrq->pd); 1544 - struct qedr_create_srq_ureq ureq = {}; 1550 + struct qedr_create_srq_ureq ureq; 1545 1551 u64 pbl_base_addr, phy_prod_pair_addr; 1546 1552 struct qedr_srq_hwq_info *hw_srq; 1547 1553 u32 page_cnt, page_size; ··· 1569 1575 hw_srq->max_sges = init_attr->attr.max_sge; 1570 1576 1571 1577 if (udata) { 1572 - if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq), 1573 - udata->inlen))) { 1574 - DP_ERR(dev, 1575 - "create srq: problem copying data from user space\n"); 1576 - goto err0; 1577 - } 1578 + rc = ib_copy_validate_udata_in(udata, ureq, srq_len); 1579 + if (rc) 1580 + return rc; 1578 1581 1579 1582 rc = qedr_init_srq_user_params(udata, srq, &ureq, 0); 1580 1583 if (rc) ··· 1837 1846 struct qed_rdma_create_qp_in_params in_params; 1838 1847 struct qed_rdma_create_qp_out_params out_params; 1839 1848 struct qedr_create_qp_uresp uresp = {}; 1840 - struct qedr_create_qp_ureq ureq = {}; 1849 + struct qedr_create_qp_ureq ureq; 1841 1850 int alloc_and_init = rdma_protocol_roce(&dev->ibdev, 1); 1842 1851 struct qedr_ucontext *ctx = NULL; 1843 1852 struct qedr_pd *pd = NULL; ··· 1851 1860 } 1852 1861 1853 1862 if (udata) { 1854 - rc = ib_copy_from_udata(&ureq, udata, min(sizeof(ureq), 1855 - udata->inlen)); 1856 - if (rc) { 1857 - DP_ERR(dev, "Problem copying data from user space\n"); 1863 + rc = ib_copy_validate_udata_in(udata, ureq, rq_len); 1864 + if (rc) 1858 1865 return rc; 1859 - } 1860 1866 } 1861 1867 1862 1868 if (qedr_qp_has_sq(qp)) {

+1 -1

drivers/infiniband/hw/usnic/usnic_ib_verbs.c

··· 476 476 if (init_attr->create_flags) 477 477 return -EOPNOTSUPP; 478 478 479 - err = ib_copy_from_udata(&cmd, udata, sizeof(cmd)); 479 + err = ib_copy_validate_udata_in(udata, cmd, spec); 480 480 if (err) { 481 481 usnic_err("%s: cannot copy udata for create_qp\n", 482 482 dev_name(&us_ibdev->ib_dev.dev));

+1 -1

drivers/infiniband/hw/vmw_pvrdma/pvrdma.h

··· 53 53 #include <linux/pci.h> 54 54 #include <linux/semaphore.h> 55 55 #include <linux/workqueue.h> 56 - #include <rdma/ib_umem.h> 57 56 #include <rdma/ib_verbs.h> 57 + #include <rdma/iter.h> 58 58 #include <rdma/vmw_pvrdma-abi.h> 59 59 60 60 #include "pvrdma_ring.h"

+2 -3

drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c

··· 134 134 cq->is_kernel = !udata; 135 135 136 136 if (!cq->is_kernel) { 137 - if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) { 138 - ret = -EFAULT; 137 + ret = ib_copy_validate_udata_in(udata, ucmd, reserved); 138 + if (ret) 139 139 goto err_cq; 140 - } 141 140 142 141 cq->umem = ib_umem_get(ibdev, ucmd.buf_addr, ucmd.buf_size, 143 142 IB_ACCESS_LOCAL_WRITE);

+1 -2

drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c

··· 65 65 goto err; 66 66 67 67 pdir->ntables = PVRDMA_PAGE_DIR_TABLE(npages - 1) + 1; 68 - pdir->tables = kcalloc(pdir->ntables, sizeof(*pdir->tables), 69 - GFP_KERNEL); 68 + pdir->tables = kzalloc_objs(*pdir->tables, pdir->ntables); 70 69 if (!pdir->tables) 71 70 goto err; 72 71

+3 -3

drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c

··· 49 49 #include <rdma/ib_addr.h> 50 50 #include <rdma/ib_smi.h> 51 51 #include <rdma/ib_user_verbs.h> 52 + #include <rdma/uverbs_ioctl.h> 52 53 53 54 #include "pvrdma.h" 54 55 ··· 253 252 dev_dbg(&dev->pdev->dev, 254 253 "create queuepair from user space\n"); 255 254 256 - if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) { 257 - ret = -EFAULT; 255 + ret = ib_copy_validate_udata_in(udata, ucmd, qp_addr); 256 + if (ret) 258 257 goto err_qp; 259 - } 260 258 261 259 /* Userspace supports qpn and qp handles? */ 262 260 if (dev->dsr_version >= PVRDMA_QPHANDLE_VERSION &&

+3 -3

drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c

··· 49 49 #include <rdma/ib_addr.h> 50 50 #include <rdma/ib_smi.h> 51 51 #include <rdma/ib_user_verbs.h> 52 + #include <rdma/uverbs_ioctl.h> 52 53 53 54 #include "pvrdma.h" 54 55 ··· 142 141 dev_dbg(&dev->pdev->dev, 143 142 "create shared receive queue from user space\n"); 144 143 145 - if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) { 146 - ret = -EFAULT; 144 + ret = ib_copy_validate_udata_in(udata, ucmd, reserved); 145 + if (ret) 147 146 goto err_srq; 148 - } 149 147 150 148 srq->umem = ib_umem_get(ibsrq->device, ucmd.buf_addr, ucmd.buf_size, 0); 151 149 if (IS_ERR(srq->umem)) {

+2 -2

drivers/infiniband/sw/rdmavt/cq.c

··· 337 337 * 338 338 * Return: 0 for success. 339 339 */ 340 - int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) 340 + int rvt_resize_cq(struct ib_cq *ibcq, unsigned int cqe, struct ib_udata *udata) 341 341 { 342 342 struct rvt_cq *cq = ibcq_to_rvtcq(ibcq); 343 343 u32 head, tail, n; ··· 349 349 struct rvt_k_cq_wc *k_wc = NULL; 350 350 struct rvt_k_cq_wc *old_k_wc = NULL; 351 351 352 - if (cqe < 1 || cqe > rdi->dparms.props.max_cqe) 352 + if (cqe > rdi->dparms.props.max_cqe) 353 353 return -EINVAL; 354 354 355 355 /*

+1 -1

drivers/infiniband/sw/rdmavt/cq.h

··· 13 13 struct uverbs_attr_bundle *attrs); 14 14 int rvt_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata); 15 15 int rvt_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags); 16 - int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata); 16 + int rvt_resize_cq(struct ib_cq *ibcq, unsigned int cqe, struct ib_udata *udata); 17 17 int rvt_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); 18 18 int rvt_driver_cq_init(void); 19 19 void rvt_cq_exit(void);

-1

drivers/infiniband/sw/rdmavt/mcast.c

··· 49 49 { 50 50 struct rvt_qp *qp = mqp->qp; 51 51 52 - /* Notify hfi1_destroy_qp() if it is waiting. */ 53 52 rvt_put_qp(qp); 54 53 55 54 kfree(mqp);

+17 -5

drivers/infiniband/sw/rdmavt/mmap.c

··· 9 9 #include <rdma/uverbs_ioctl.h> 10 10 #include "mmap.h" 11 11 12 + /* number of reserved mmaps for the driver */ 13 + #define MMAP_RESERVED 256 14 + /* start point for dynamic offsets */ 15 + #define MMAP_OFFSET_START (MMAP_RESERVED * PAGE_SIZE) 16 + 12 17 /** 13 18 * rvt_mmap_init - init link list and lock for mem map 14 19 * @rdi: rvt dev struct ··· 22 17 { 23 18 INIT_LIST_HEAD(&rdi->pending_mmaps); 24 19 spin_lock_init(&rdi->pending_lock); 25 - rdi->mmap_offset = PAGE_SIZE; 20 + rdi->mmap_offset = MMAP_OFFSET_START; 26 21 spin_lock_init(&rdi->mmap_offset_lock); 27 22 } 28 23 ··· 77 72 unsigned long size = vma->vm_end - vma->vm_start; 78 73 struct rvt_mmap_info *ip, *pp; 79 74 int ret = -EINVAL; 75 + 76 + /* call driver if in reserved range */ 77 + if (offset < MMAP_OFFSET_START) { 78 + if (rdi->driver_f.mmap) 79 + return rdi->driver_f.mmap(context, vma); 80 + return -EINVAL; 81 + } 80 82 81 83 /* 82 84 * Search the device's list of objects waiting for a mmap call. ··· 141 129 142 130 spin_lock_irq(&rdi->mmap_offset_lock); 143 131 if (rdi->mmap_offset == 0) 144 - rdi->mmap_offset = ALIGN(PAGE_SIZE, SHMLBA); 132 + rdi->mmap_offset = MMAP_OFFSET_START; 145 133 ip->offset = rdi->mmap_offset; 146 - rdi->mmap_offset += ALIGN(size, SHMLBA); 134 + rdi->mmap_offset += PAGE_SIZE; 147 135 spin_unlock_irq(&rdi->mmap_offset_lock); 148 136 149 137 INIT_LIST_HEAD(&ip->pending_mmaps); ··· 171 159 172 160 spin_lock_irq(&rdi->mmap_offset_lock); 173 161 if (rdi->mmap_offset == 0) 174 - rdi->mmap_offset = PAGE_SIZE; 162 + rdi->mmap_offset = MMAP_OFFSET_START; 175 163 ip->offset = rdi->mmap_offset; 176 - rdi->mmap_offset += size; 164 + rdi->mmap_offset += PAGE_SIZE; 177 165 spin_unlock_irq(&rdi->mmap_offset_lock); 178 166 179 167 ip->size = size;

+1 -1

drivers/infiniband/sw/rdmavt/qp.c

··· 2705 2705 struct rvt_ibport *rvp; 2706 2706 int pidx; 2707 2707 2708 - pidx = n % rdi->ibdev.phys_port_cnt; 2708 + pidx = n / 2; /* QP0 and QP1 */ 2709 2709 rvp = rdi->ports[pidx]; 2710 2710 qp = rcu_dereference(rvp->qp[n & 1]); 2711 2711 } else {

+9 -1

drivers/infiniband/sw/rdmavt/vt.c

··· 244 244 */ 245 245 static int rvt_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata) 246 246 { 247 + struct rvt_dev_info *rdi = ib_to_rvt(uctx->device); 248 + 249 + if (rdi->driver_f.alloc_ucontext) 250 + return rdi->driver_f.alloc_ucontext(uctx, udata); 247 251 return 0; 248 252 } 249 253 ··· 257 253 */ 258 254 static void rvt_dealloc_ucontext(struct ib_ucontext *context) 259 255 { 256 + struct rvt_dev_info *rdi = ib_to_rvt(context->device); 257 + 258 + if (rdi->driver_f.dealloc_ucontext) 259 + rdi->driver_f.dealloc_ucontext(context); 260 260 return; 261 261 } 262 262 ··· 375 367 .query_srq = rvt_query_srq, 376 368 .reg_user_mr = rvt_reg_user_mr, 377 369 .req_notify_cq = rvt_req_notify_cq, 378 - .resize_cq = rvt_resize_cq, 370 + .resize_user_cq = rvt_resize_cq, 379 371 380 372 INIT_RDMA_OBJ_SIZE(ib_ah, rvt_ah, ibah), 381 373 INIT_RDMA_OBJ_SIZE(ib_cq, rvt_cq, ibcq),

+2 -1

drivers/infiniband/sw/rxe/Makefile

··· 22 22 rxe_mcast.o \ 23 23 rxe_task.o \ 24 24 rxe_net.o \ 25 - rxe_hw_counters.o 25 + rxe_hw_counters.o \ 26 + rxe_ns.o 26 27 27 28 rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o

+33 -5

drivers/infiniband/sw/rxe/rxe.c

··· 8 8 #include <net/addrconf.h> 9 9 #include "rxe.h" 10 10 #include "rxe_loc.h" 11 + #include "rxe_net.h" 12 + #include "rxe_ns.h" 11 13 12 14 MODULE_AUTHOR("Bob Pearson, Frank Zago, John Groves, Kamal Heib"); 13 15 MODULE_DESCRIPTION("Soft RDMA transport"); ··· 202 200 port->mtu_cap = ib_mtu_enum_to_int(mtu); 203 201 } 204 202 203 + static struct rdma_link_ops rxe_link_ops; 204 + 205 205 /* called by ifc layer to create new rxe device. 206 206 * The caller should allocate memory for rxe by calling ib_alloc_device. 207 207 */ ··· 212 208 { 213 209 rxe_init(rxe, ndev); 214 210 rxe_set_mtu(rxe, mtu); 211 + rxe->ib_dev.link_ops = &rxe_link_ops; 215 212 216 213 return rxe_register_device(rxe, ibdev_name, ndev); 217 214 } ··· 236 231 goto err; 237 232 } 238 233 234 + err = rxe_net_init(ndev); 235 + if (err) 236 + return err; 237 + 239 238 err = rxe_net_add(ibdev_name, ndev); 240 239 if (err) { 241 240 rxe_err("failed to add %s\n", ndev->name); ··· 249 240 return err; 250 241 } 251 242 243 + static int rxe_dellink(struct ib_device *dev) 244 + { 245 + rxe_net_del(dev); 246 + 247 + return 0; 248 + } 249 + 252 250 static struct rdma_link_ops rxe_link_ops = { 253 251 .type = "rxe", 254 252 .newlink = rxe_newlink, 253 + .dellink = rxe_dellink, 255 254 }; 256 255 257 256 static int __init rxe_module_init(void) ··· 270 253 if (err) 271 254 return err; 272 255 273 - err = rxe_net_init(); 274 - if (err) { 275 - rxe_destroy_wq(); 276 - return err; 277 - } 256 + err = rxe_namespace_init(); 257 + if (err) 258 + goto err_destroy_wq; 259 + 260 + err = rxe_register_notifier(); 261 + if (err) 262 + goto err_namespace_exit; 278 263 279 264 rdma_link_register(&rxe_link_ops); 265 + 280 266 pr_info("loaded\n"); 281 267 return 0; 268 + 269 + err_namespace_exit: 270 + rxe_namespace_exit(); 271 + err_destroy_wq: 272 + rxe_destroy_wq(); 273 + return err; 282 274 } 283 275 284 276 static void __exit rxe_module_exit(void) ··· 296 270 ib_unregister_driver(RDMA_DRIVER_RXE); 297 271 rxe_net_exit(); 298 272 rxe_destroy_wq(); 273 + 274 + rxe_namespace_exit(); 299 275 300 276 pr_info("unloaded\n"); 301 277 }

+2

drivers/infiniband/sw/rxe/rxe.h

··· 121 121 void rxe_port_down(struct rxe_dev *rxe); 122 122 void rxe_set_port_state(struct rxe_dev *rxe); 123 123 124 + extern struct workqueue_struct *rxe_wq; 125 + 124 126 #endif /* RXE_H */

-31

drivers/infiniband/sw/rxe/rxe_cq.c

··· 8 8 #include "rxe_loc.h" 9 9 #include "rxe_queue.h" 10 10 11 - int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq, 12 - int cqe, int comp_vector) 13 - { 14 - int count; 15 - 16 - if (cqe <= 0) { 17 - rxe_dbg_dev(rxe, "cqe(%d) <= 0\n", cqe); 18 - goto err1; 19 - } 20 - 21 - if (cqe > rxe->attr.max_cqe) { 22 - rxe_dbg_dev(rxe, "cqe(%d) > max_cqe(%d)\n", 23 - cqe, rxe->attr.max_cqe); 24 - goto err1; 25 - } 26 - 27 - if (cq) { 28 - count = queue_count(cq->queue, QUEUE_TYPE_TO_CLIENT); 29 - if (cqe < count) { 30 - rxe_dbg_cq(cq, "cqe(%d) < current # elements in queue (%d)\n", 31 - cqe, count); 32 - goto err1; 33 - } 34 - } 35 - 36 - return 0; 37 - 38 - err1: 39 - return -EINVAL; 40 - } 41 - 42 11 int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe, 43 12 int comp_vector, struct ib_udata *udata, 44 13 struct rxe_create_cq_resp __user *uresp)

-3

drivers/infiniband/sw/rxe/rxe_loc.h

··· 18 18 struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt, struct rxe_ah **ahp); 19 19 20 20 /* rxe_cq.c */ 21 - int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq, 22 - int cqe, int comp_vector); 23 - 24 21 int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe, 25 22 int comp_vector, struct ib_udata *udata, 26 23 struct rxe_create_cq_resp __user *uresp);

+109 -35

drivers/infiniband/sw/rxe/rxe_net.c

··· 17 17 #include "rxe.h" 18 18 #include "rxe_net.h" 19 19 #include "rxe_loc.h" 20 + #include "rxe_ns.h" 20 21 21 - static struct rxe_recv_sockets recv_sockets; 22 + #ifndef SK_REF_FOR_TUNNEL 23 + #define SK_REF_FOR_TUNNEL 2 24 + #endif 22 25 23 26 #ifdef CONFIG_DEBUG_LOCK_ALLOC 24 27 /* ··· 104 101 } 105 102 106 103 static struct dst_entry *rxe_find_route4(struct rxe_qp *qp, 104 + struct net *net, 107 105 struct net_device *ndev, 108 106 struct in_addr *saddr, 109 107 struct in_addr *daddr) 110 108 { 111 109 struct rtable *rt; 112 - struct flowi4 fl = { { 0 } }; 110 + struct flowi4 fl = {}; 113 111 114 - memset(&fl, 0, sizeof(fl)); 115 112 fl.flowi4_oif = ndev->ifindex; 116 113 memcpy(&fl.saddr, saddr, sizeof(*saddr)); 117 114 memcpy(&fl.daddr, daddr, sizeof(*daddr)); 118 115 fl.flowi4_proto = IPPROTO_UDP; 119 116 120 - rt = ip_route_output_key(&init_net, &fl); 117 + rt = ip_route_output_key(net, &fl); 121 118 if (IS_ERR(rt)) { 122 119 rxe_dbg_qp(qp, "no route to %pI4\n", &daddr->s_addr); 123 120 return NULL; ··· 128 125 129 126 #if IS_ENABLED(CONFIG_IPV6) 130 127 static struct dst_entry *rxe_find_route6(struct rxe_qp *qp, 128 + struct net *net, 131 129 struct net_device *ndev, 132 130 struct in6_addr *saddr, 133 131 struct in6_addr *daddr) 134 132 { 135 133 struct dst_entry *ndst; 136 - struct flowi6 fl6 = { { 0 } }; 134 + struct flowi6 fl6 = {}; 137 135 138 - memset(&fl6, 0, sizeof(fl6)); 139 136 fl6.flowi6_oif = ndev->ifindex; 140 137 memcpy(&fl6.saddr, saddr, sizeof(*saddr)); 141 138 memcpy(&fl6.daddr, daddr, sizeof(*daddr)); 142 139 fl6.flowi6_proto = IPPROTO_UDP; 143 140 144 - ndst = ip6_dst_lookup_flow(sock_net(recv_sockets.sk6->sk), 145 - recv_sockets.sk6->sk, &fl6, 146 - NULL); 141 + ndst = ip6_dst_lookup_flow(net, rxe_ns_pernet_sk6(net), &fl6, NULL); 147 142 if (IS_ERR(ndst)) { 148 143 rxe_dbg_qp(qp, "no route to %pI6\n", daddr); 149 144 return NULL; ··· 161 160 #else 162 161 163 162 static struct dst_entry *rxe_find_route6(struct rxe_qp *qp, 163 + struct net *net, 164 164 struct net_device *ndev, 165 165 struct in6_addr *saddr, 166 166 struct in6_addr *daddr) ··· 176 174 struct rxe_av *av) 177 175 { 178 176 struct dst_entry *dst = NULL; 177 + struct net *net; 179 178 180 179 if (qp_type(qp) == IB_QPT_RC) 181 180 dst = sk_dst_get(qp->sk->sk); ··· 185 182 if (dst) 186 183 dst_release(dst); 187 184 185 + net = dev_net(ndev); 186 + 188 187 if (av->network_type == RXE_NETWORK_TYPE_IPV4) { 189 188 struct in_addr *saddr; 190 189 struct in_addr *daddr; 191 190 192 191 saddr = &av->sgid_addr._sockaddr_in.sin_addr; 193 192 daddr = &av->dgid_addr._sockaddr_in.sin_addr; 194 - dst = rxe_find_route4(qp, ndev, saddr, daddr); 193 + dst = rxe_find_route4(qp, net, ndev, saddr, daddr); 195 194 } else if (av->network_type == RXE_NETWORK_TYPE_IPV6) { 196 195 struct in6_addr *saddr6; 197 196 struct in6_addr *daddr6; 198 197 199 198 saddr6 = &av->sgid_addr._sockaddr_in6.sin6_addr; 200 199 daddr6 = &av->dgid_addr._sockaddr_in6.sin6_addr; 201 - dst = rxe_find_route6(qp, ndev, saddr6, daddr6); 200 + dst = rxe_find_route6(qp, net, ndev, saddr6, daddr6); 202 201 #if IS_ENABLED(CONFIG_IPV6) 203 202 if (dst) 204 203 qp->dst_cookie = ··· 629 624 return 0; 630 625 } 631 626 627 + static void rxe_sock_put(struct sock *sk, 628 + void (*set_sk)(struct net *, struct sock *), 629 + struct net *net) 630 + { 631 + if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) { 632 + __sock_put(sk); 633 + } else { 634 + rxe_release_udp_tunnel(sk->sk_socket); 635 + sk = NULL; 636 + set_sk(net, sk); 637 + } 638 + } 639 + 640 + void rxe_net_del(struct ib_device *dev) 641 + { 642 + struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev); 643 + struct net_device *ndev; 644 + struct sock *sk; 645 + struct net *net; 646 + 647 + ndev = rxe_ib_device_get_netdev(&rxe->ib_dev); 648 + if (!ndev) 649 + return; 650 + 651 + net = dev_net(ndev); 652 + 653 + sk = rxe_ns_pernet_sk4(net); 654 + if (sk) 655 + rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net); 656 + 657 + sk = rxe_ns_pernet_sk6(net); 658 + if (sk) 659 + rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net); 660 + 661 + dev_put(ndev); 662 + } 663 + 632 664 static void rxe_port_event(struct rxe_dev *rxe, 633 665 enum ib_event_type event) 634 666 { ··· 722 680 switch (event) { 723 681 case NETDEV_UNREGISTER: 724 682 ib_unregister_device_queued(&rxe->ib_dev); 683 + rxe_net_del(&rxe->ib_dev); 725 684 break; 726 685 case NETDEV_CHANGEMTU: 727 686 rxe_dbg_dev(rxe, "%s changed mtu to %d\n", ndev->name, ndev->mtu); ··· 752 709 .notifier_call = rxe_notify, 753 710 }; 754 711 755 - static int rxe_net_ipv4_init(void) 712 + static int rxe_net_ipv4_init(struct net *net) 756 713 { 757 - recv_sockets.sk4 = rxe_setup_udp_tunnel(&init_net, 758 - htons(ROCE_V2_UDP_DPORT), false); 759 - if (IS_ERR(recv_sockets.sk4)) { 760 - recv_sockets.sk4 = NULL; 714 + struct sock *sk; 715 + struct socket *sock; 716 + 717 + sk = rxe_ns_pernet_sk4(net); 718 + if (sk) { 719 + sock_hold(sk); 720 + return 0; 721 + } 722 + 723 + sock = rxe_setup_udp_tunnel(net, htons(ROCE_V2_UDP_DPORT), false); 724 + if (IS_ERR(sock)) { 761 725 pr_err("Failed to create IPv4 UDP tunnel\n"); 762 726 return -1; 763 727 } 728 + rxe_ns_pernet_set_sk4(net, sock->sk); 764 729 765 730 return 0; 766 731 } 767 732 768 - static int rxe_net_ipv6_init(void) 733 + static int rxe_net_ipv6_init(struct net *net) 769 734 { 770 735 #if IS_ENABLED(CONFIG_IPV6) 736 + struct sock *sk; 737 + struct socket *sock; 771 738 772 - recv_sockets.sk6 = rxe_setup_udp_tunnel(&init_net, 773 - htons(ROCE_V2_UDP_DPORT), true); 774 - if (PTR_ERR(recv_sockets.sk6) == -EAFNOSUPPORT) { 775 - recv_sockets.sk6 = NULL; 739 + sk = rxe_ns_pernet_sk6(net); 740 + if (sk) { 741 + sock_hold(sk); 742 + return 0; 743 + } 744 + 745 + sock = rxe_setup_udp_tunnel(net, htons(ROCE_V2_UDP_DPORT), true); 746 + if (PTR_ERR(sock) == -EAFNOSUPPORT) { 776 747 pr_warn("IPv6 is not supported, can not create a UDPv6 socket\n"); 777 748 return 0; 778 749 } 779 750 780 - if (IS_ERR(recv_sockets.sk6)) { 781 - recv_sockets.sk6 = NULL; 751 + if (IS_ERR(sock)) { 782 752 pr_err("Failed to create IPv6 UDP tunnel\n"); 783 753 return -1; 784 754 } 755 + 756 + rxe_ns_pernet_set_sk6(net, sock->sk); 757 + 785 758 #endif 759 + return 0; 760 + } 761 + 762 + int rxe_register_notifier(void) 763 + { 764 + int err; 765 + 766 + err = register_netdevice_notifier(&rxe_net_notifier); 767 + if (err) { 768 + pr_err("Failed to register netdev notifier\n"); 769 + return -1; 770 + } 771 + 786 772 return 0; 787 773 } 788 774 789 775 void rxe_net_exit(void) 790 776 { 791 - rxe_release_udp_tunnel(recv_sockets.sk6); 792 - rxe_release_udp_tunnel(recv_sockets.sk4); 793 777 unregister_netdevice_notifier(&rxe_net_notifier); 794 778 } 795 779 796 - int rxe_net_init(void) 780 + int rxe_net_init(struct net_device *ndev) 797 781 { 782 + struct net *net; 783 + struct sock *sk; 798 784 int err; 799 785 800 - recv_sockets.sk6 = NULL; 786 + net = dev_net(ndev); 801 787 802 - err = rxe_net_ipv4_init(); 788 + err = rxe_net_ipv4_init(net); 803 789 if (err) 804 790 return err; 805 - err = rxe_net_ipv6_init(); 791 + 792 + err = rxe_net_ipv6_init(net); 806 793 if (err) 807 794 goto err_out; 808 - err = register_netdevice_notifier(&rxe_net_notifier); 809 - if (err) { 810 - pr_err("Failed to register netdev notifier\n"); 811 - goto err_out; 812 - } 795 + 813 796 return 0; 797 + 814 798 err_out: 815 - rxe_net_exit(); 799 + /* If ipv6 error, release ipv4 resource */ 800 + sk = rxe_ns_pernet_sk4(net); 801 + if (sk) 802 + rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net); 803 + 816 804 return err; 817 805 }

+3 -6

drivers/infiniband/sw/rxe/rxe_net.h

··· 11 11 #include <net/if_inet6.h> 12 12 #include <linux/module.h> 13 13 14 - struct rxe_recv_sockets { 15 - struct socket *sk4; 16 - struct socket *sk6; 17 - }; 18 - 19 14 int rxe_net_add(const char *ibdev_name, struct net_device *ndev); 15 + void rxe_net_del(struct ib_device *dev); 20 16 21 - int rxe_net_init(void); 17 + int rxe_register_notifier(void); 18 + int rxe_net_init(struct net_device *ndev); 22 19 void rxe_net_exit(void); 23 20 24 21 #endif /* RXE_NET_H */

+124

drivers/infiniband/sw/rxe/rxe_ns.c

··· 1 + /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ 2 + 3 + #include <net/sock.h> 4 + #include <net/netns/generic.h> 5 + #include <net/net_namespace.h> 6 + #include <linux/module.h> 7 + #include <linux/skbuff.h> 8 + #include <linux/pid_namespace.h> 9 + #include <net/udp_tunnel.h> 10 + 11 + #include "rxe_ns.h" 12 + 13 + /* 14 + * Per network namespace data 15 + */ 16 + struct rxe_ns_sock { 17 + struct sock __rcu *rxe_sk4; 18 + struct sock __rcu *rxe_sk6; 19 + }; 20 + 21 + /* 22 + * Index to store custom data for each network namespace. 23 + */ 24 + static unsigned int rxe_pernet_id; 25 + 26 + /* 27 + * Called for every existing and added network namespaces 28 + */ 29 + static int rxe_ns_init(struct net *net) 30 + { 31 + /* defer socket create in the namespace to the first 32 + * device create. 33 + */ 34 + 35 + return 0; 36 + } 37 + 38 + static void rxe_ns_exit(struct net *net) 39 + { 40 + /* called when the network namespace is removed 41 + */ 42 + struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id); 43 + struct sock *sk; 44 + 45 + rcu_read_lock(); 46 + sk = rcu_dereference(ns_sk->rxe_sk4); 47 + rcu_read_unlock(); 48 + if (sk) { 49 + rcu_assign_pointer(ns_sk->rxe_sk4, NULL); 50 + udp_tunnel_sock_release(sk->sk_socket); 51 + } 52 + 53 + #if IS_ENABLED(CONFIG_IPV6) 54 + rcu_read_lock(); 55 + sk = rcu_dereference(ns_sk->rxe_sk6); 56 + rcu_read_unlock(); 57 + if (sk) { 58 + rcu_assign_pointer(ns_sk->rxe_sk6, NULL); 59 + udp_tunnel_sock_release(sk->sk_socket); 60 + } 61 + #endif 62 + } 63 + 64 + /* 65 + * callback to make the module network namespace aware 66 + */ 67 + static struct pernet_operations rxe_net_ops = { 68 + .init = rxe_ns_init, 69 + .exit = rxe_ns_exit, 70 + .id = &rxe_pernet_id, 71 + .size = sizeof(struct rxe_ns_sock), 72 + }; 73 + 74 + struct sock *rxe_ns_pernet_sk4(struct net *net) 75 + { 76 + struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id); 77 + struct sock *sk; 78 + 79 + rcu_read_lock(); 80 + sk = rcu_dereference(ns_sk->rxe_sk4); 81 + rcu_read_unlock(); 82 + 83 + return sk; 84 + } 85 + 86 + void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk) 87 + { 88 + struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id); 89 + 90 + rcu_assign_pointer(ns_sk->rxe_sk4, sk); 91 + synchronize_rcu(); 92 + } 93 + 94 + #if IS_ENABLED(CONFIG_IPV6) 95 + struct sock *rxe_ns_pernet_sk6(struct net *net) 96 + { 97 + struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id); 98 + struct sock *sk; 99 + 100 + rcu_read_lock(); 101 + sk = rcu_dereference(ns_sk->rxe_sk6); 102 + rcu_read_unlock(); 103 + 104 + return sk; 105 + } 106 + 107 + void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk) 108 + { 109 + struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id); 110 + 111 + rcu_assign_pointer(ns_sk->rxe_sk6, sk); 112 + synchronize_rcu(); 113 + } 114 + #endif /* IPV6 */ 115 + 116 + int rxe_namespace_init(void) 117 + { 118 + return register_pernet_subsys(&rxe_net_ops); 119 + } 120 + 121 + void rxe_namespace_exit(void) 122 + { 123 + unregister_pernet_subsys(&rxe_net_ops); 124 + }

+26

drivers/infiniband/sw/rxe/rxe_ns.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ 2 + 3 + #ifndef RXE_NS_H 4 + #define RXE_NS_H 5 + 6 + struct sock *rxe_ns_pernet_sk4(struct net *net); 7 + void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk); 8 + 9 + #if IS_ENABLED(CONFIG_IPV6) 10 + void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk); 11 + struct sock *rxe_ns_pernet_sk6(struct net *net); 12 + #else /* IPv6 */ 13 + static inline struct sock *rxe_ns_pernet_sk6(struct net *net) 14 + { 15 + return NULL; 16 + } 17 + 18 + static inline void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk) 19 + { 20 + } 21 + #endif /* IPv6 */ 22 + 23 + int rxe_namespace_init(void); 24 + void rxe_namespace_exit(void); 25 + 26 + #endif /* RXE_NS_H */

+1 -1

drivers/infiniband/sw/rxe/rxe_odp.c

··· 545 545 work->frags[i].mr = mr; 546 546 } 547 547 548 - queue_work(system_unbound_wq, &work->work); 548 + queue_work(rxe_wq, &work->work); 549 549 550 550 return 0; 551 551

+2 -1

drivers/infiniband/sw/rxe/rxe_recv.c

··· 330 330 pkt->qp = NULL; 331 331 pkt->mask |= rxe_opcode[pkt->opcode].mask; 332 332 333 - if (unlikely(skb->len < header_size(pkt))) 333 + if (unlikely(pkt->paylen < header_size(pkt) + bth_pad(pkt) + 334 + RXE_ICRC_SIZE)) 334 335 goto drop; 335 336 336 337 err = hdr_check(pkt);

+43 -13

drivers/infiniband/sw/rxe/rxe_resp.c

··· 37 37 [RESPST_ERR_MISSING_OPCODE_LAST_D1E] = "ERR_MISSING_OPCODE_LAST_D1E", 38 38 [RESPST_ERR_TOO_MANY_RDMA_ATM_REQ] = "ERR_TOO_MANY_RDMA_ATM_REQ", 39 39 [RESPST_ERR_RNR] = "ERR_RNR", 40 + [RESPST_ERR_RKEY_VIOLATION_EVENT] = "ERR_RKEY_VIOLATION_EVENT", 40 41 [RESPST_ERR_RKEY_VIOLATION] = "ERR_RKEY_VIOLATION", 41 42 [RESPST_ERR_INVALIDATE_RKEY] = "ERR_INVALIDATE_RKEY_VIOLATION", 42 43 [RESPST_ERR_LENGTH] = "ERR_LENGTH", ··· 424 423 qp->resp.resid = sizeof(u64); 425 424 } 426 425 426 + /* Transition to an rkey violation state. C9-222.1 requires an async event 427 + * at the responder, but only if the error cannot be attached to an RX WQE. 428 + * WRITE_WITH_IMM is the only op that might have that more precise RX WQE 429 + * to pin the error on. 430 + */ 431 + static enum resp_states get_rkey_violation_state(struct rxe_pkt_info *pkt) 432 + { 433 + if (pkt->mask & RXE_IMMDT_MASK) 434 + return RESPST_ERR_RKEY_VIOLATION; 435 + 436 + return RESPST_ERR_RKEY_VIOLATION_EVENT; 437 + } 438 + 427 439 /* resolve the packet rkey to qp->resp.mr or set qp->resp.mr to NULL 428 440 * if an invalid rkey is received or the rdma length is zero. For middle 429 441 * or last packets use the stored value of mr. ··· 500 486 mw = rxe_lookup_mw(qp, access, rkey); 501 487 if (!mw) { 502 488 rxe_dbg_qp(qp, "no MW matches rkey %#x\n", rkey); 503 - state = RESPST_ERR_RKEY_VIOLATION; 489 + state = get_rkey_violation_state(pkt); 504 490 goto err; 505 491 } 506 492 507 493 mr = mw->mr; 508 494 if (!mr) { 509 495 rxe_dbg_qp(qp, "MW doesn't have an MR\n"); 510 - state = RESPST_ERR_RKEY_VIOLATION; 496 + state = get_rkey_violation_state(pkt); 511 497 goto err; 512 498 } 513 499 ··· 521 507 mr = lookup_mr(qp->pd, access, rkey, RXE_LOOKUP_REMOTE); 522 508 if (!mr) { 523 509 rxe_dbg_qp(qp, "no MR matches rkey %#x\n", rkey); 524 - state = RESPST_ERR_RKEY_VIOLATION; 510 + state = get_rkey_violation_state(pkt); 525 511 goto err; 526 512 } 527 513 } ··· 535 521 } 536 522 537 523 if (mr_check_range(mr, va + qp->resp.offset, resid)) { 538 - state = RESPST_ERR_RKEY_VIOLATION; 524 + state = get_rkey_violation_state(pkt); 539 525 goto err; 540 526 } 541 527 ··· 600 586 err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset, 601 587 payload_addr(pkt), data_len, RXE_TO_MR_OBJ); 602 588 if (err) { 603 - rc = RESPST_ERR_RKEY_VIOLATION; 589 + rc = get_rkey_violation_state(pkt); 604 590 goto out; 605 591 } 606 592 ··· 681 667 682 668 if (res->flush.type & IB_FLUSH_PERSISTENT) { 683 669 if (rxe_flush_pmem_iova(mr, start, length)) 684 - return RESPST_ERR_RKEY_VIOLATION; 670 + return get_rkey_violation_state(pkt); 685 671 /* Make data persistent. */ 686 672 wmb(); 687 673 } else if (res->flush.type & IB_FLUSH_GLOBAL) { ··· 1397 1383 return rc; 1398 1384 } 1399 1385 1386 + static void do_qp_event(struct rxe_qp *qp, enum ib_event_type etype) 1387 + { 1388 + struct ib_event event; 1389 + struct ib_qp *ibqp = &qp->ibqp; 1390 + 1391 + event.event = etype; 1392 + event.device = ibqp->device; 1393 + event.element.qp = ibqp; 1394 + if (ibqp->event_handler) { 1395 + rxe_dbg_qp(qp, "reporting QP event %d\n", etype); 1396 + ibqp->event_handler(&event, ibqp->qp_context); 1397 + } 1398 + } 1399 + 1400 1400 /* Process a class A or C. Both are treated the same in this implementation. */ 1401 1401 static void do_class_ac_error(struct rxe_qp *qp, u8 syndrome, 1402 1402 enum ib_wc_status status) ··· 1504 1476 int err; 1505 1477 1506 1478 if (qp->srq) { 1507 - if (notify && qp->ibqp.event_handler) { 1508 - struct ib_event ev; 1479 + if (notify && qp->ibqp.event_handler) 1480 + do_qp_event(qp, IB_EVENT_QP_LAST_WQE_REACHED); 1509 1481 1510 - ev.device = qp->ibqp.device; 1511 - ev.element.qp = &qp->ibqp; 1512 - ev.event = IB_EVENT_QP_LAST_WQE_REACHED; 1513 - qp->ibqp.event_handler(&ev, qp->ibqp.qp_context); 1514 - } 1515 1482 return; 1516 1483 } 1517 1484 ··· 1634 1611 qp->resp.drop_msg = 1; 1635 1612 } 1636 1613 state = RESPST_CLEANUP; 1614 + break; 1615 + 1616 + case RESPST_ERR_RKEY_VIOLATION_EVENT: 1617 + if (qp_type(qp) == IB_QPT_RC) 1618 + do_qp_event(qp, IB_EVENT_QP_ACCESS_ERR); 1619 + 1620 + state = RESPST_ERR_RKEY_VIOLATION; 1637 1621 break; 1638 1622 1639 1623 case RESPST_ERR_RKEY_VIOLATION:

+1 -1

drivers/infiniband/sw/rxe/rxe_task.c

··· 6 6 7 7 #include "rxe.h" 8 8 9 - static struct workqueue_struct *rxe_wq; 9 + struct workqueue_struct *rxe_wq; 10 10 11 11 int rxe_alloc_wq(void) 12 12 {

+10 -23

drivers/infiniband/sw/rxe/rxe_verbs.c

··· 452 452 int err; 453 453 454 454 if (udata) { 455 - if (udata->inlen < sizeof(cmd)) { 456 - err = -EINVAL; 457 - rxe_dbg_srq(srq, "malformed udata\n"); 455 + err = ib_copy_validate_udata_in(udata, cmd, mmap_info_addr); 456 + if (err) 458 457 goto err_out; 459 - } 460 - 461 - err = ib_copy_from_udata(&cmd, udata, sizeof(cmd)); 462 - if (err) { 463 - err = -EFAULT; 464 - rxe_dbg_srq(srq, "unable to read udata\n"); 465 - goto err_out; 466 - } 467 458 } 468 459 469 460 err = rxe_srq_chk_attr(rxe, srq, attr, mask); ··· 1088 1097 goto err_out; 1089 1098 } 1090 1099 1091 - err = rxe_cq_chk_attr(rxe, NULL, attr->cqe, attr->comp_vector); 1092 - if (err) { 1093 - rxe_dbg_dev(rxe, "bad init attributes, err = %d\n", err); 1094 - goto err_out; 1095 - } 1100 + if (attr->cqe > rxe->attr.max_cqe) 1101 + return -EINVAL; 1096 1102 1097 1103 err = rxe_add_to_pool(&rxe->cq_pool, cq); 1098 1104 if (err) { ··· 1115 1127 return err; 1116 1128 } 1117 1129 1118 - static int rxe_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) 1130 + static int rxe_resize_cq(struct ib_cq *ibcq, unsigned int cqe, 1131 + struct ib_udata *udata) 1119 1132 { 1120 1133 struct rxe_cq *cq = to_rcq(ibcq); 1121 1134 struct rxe_dev *rxe = to_rdev(ibcq->device); ··· 1132 1143 uresp = udata->outbuf; 1133 1144 } 1134 1145 1135 - err = rxe_cq_chk_attr(rxe, cq, cqe, 0); 1136 - if (err) { 1137 - rxe_dbg_cq(cq, "bad attr, err = %d\n", err); 1138 - goto err_out; 1139 - } 1146 + if (cqe > rxe->attr.max_cqe || 1147 + cqe < queue_count(cq->queue, QUEUE_TYPE_TO_CLIENT)) 1148 + return -EINVAL; 1140 1149 1141 1150 err = rxe_cq_resize_queue(cq, cqe, uresp, udata); 1142 1151 if (err) { ··· 1506 1519 .reg_user_mr = rxe_reg_user_mr, 1507 1520 .req_notify_cq = rxe_req_notify_cq, 1508 1521 .rereg_user_mr = rxe_rereg_user_mr, 1509 - .resize_cq = rxe_resize_cq, 1522 + .resize_user_cq = rxe_resize_cq, 1510 1523 1511 1524 INIT_RDMA_OBJ_SIZE(ib_ah, rxe_ah, ibah), 1512 1525 INIT_RDMA_OBJ_SIZE(ib_cq, rxe_cq, ibcq),

+1

drivers/infiniband/sw/rxe/rxe_verbs.h

··· 154 154 RESPST_ERR_MISSING_OPCODE_LAST_D1E, 155 155 RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, 156 156 RESPST_ERR_RNR, 157 + RESPST_ERR_RKEY_VIOLATION_EVENT, 157 158 RESPST_ERR_RKEY_VIOLATION, 158 159 RESPST_ERR_INVALIDATE_RKEY, 159 160 RESPST_ERR_LENGTH,

+1 -5

drivers/infiniband/sw/siw/siw_verbs.c

··· 1373 1373 struct siw_uresp_reg_mr uresp = {}; 1374 1374 struct siw_mem *mem = mr->mem; 1375 1375 1376 - if (udata->inlen < sizeof(ureq)) { 1377 - rv = -EINVAL; 1378 - goto err_out; 1379 - } 1380 - rv = ib_copy_from_udata(&ureq, udata, sizeof(ureq)); 1376 + rv = ib_copy_validate_udata_in(udata, ureq, pad); 1381 1377 if (rv) 1382 1378 goto err_out; 1383 1379

-1

drivers/infiniband/ulp/Makefile

··· 4 4 obj-$(CONFIG_INFINIBAND_SRPT) += srpt/ 5 5 obj-$(CONFIG_INFINIBAND_ISER) += iser/ 6 6 obj-$(CONFIG_INFINIBAND_ISERT) += isert/ 7 - obj-$(CONFIG_INFINIBAND_OPA_VNIC) += opa_vnic/ 8 7 obj-$(CONFIG_INFINIBAND_RTRS) += rtrs/

-9

drivers/infiniband/ulp/opa_vnic/Kconfig

··· 1 - # SPDX-License-Identifier: GPL-2.0-only 2 - config INFINIBAND_OPA_VNIC 3 - tristate "Cornelis OPX VNIC support" 4 - depends on X86_64 && INFINIBAND 5 - help 6 - This is Omni-Path Express (OPX) Virtual Network Interface Controller (VNIC) 7 - driver for Ethernet over Omni-Path feature. It implements the HW 8 - independent VNIC functionality. It interfaces with Linux stack for 9 - data path and IB MAD for the control path.

-9

drivers/infiniband/ulp/opa_vnic/Makefile

··· 1 - # SPDX-License-Identifier: GPL-2.0-only 2 - # Makefile - Cornelis Omni-Path Express Virtual Network Controller driver 3 - # Copyright(c) 2017, Intel Corporation. 4 - # Copyright(c) 2021, Cornelis Networks. 5 - # 6 - obj-$(CONFIG_INFINIBAND_OPA_VNIC) += opa_vnic.o 7 - 8 - opa_vnic-y := opa_vnic_netdev.o opa_vnic_encap.o opa_vnic_ethtool.o \ 9 - opa_vnic_vema.o opa_vnic_vema_iface.o

-513

drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c

··· 1 - /* 2 - * Copyright(c) 2017 Intel Corporation. 3 - * 4 - * This file is provided under a dual BSD/GPLv2 license. When using or 5 - * redistributing this file, you may do so under either license. 6 - * 7 - * GPL LICENSE SUMMARY 8 - * 9 - * This program is free software; you can redistribute it and/or modify 10 - * it under the terms of version 2 of the GNU General Public License as 11 - * published by the Free Software Foundation. 12 - * 13 - * This program is distributed in the hope that it will be useful, but 14 - * WITHOUT ANY WARRANTY; without even the implied warranty of 15 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 16 - * General Public License for more details. 17 - * 18 - * BSD LICENSE 19 - * 20 - * Redistribution and use in source and binary forms, with or without 21 - * modification, are permitted provided that the following conditions 22 - * are met: 23 - * 24 - * - Redistributions of source code must retain the above copyright 25 - * notice, this list of conditions and the following disclaimer. 26 - * - Redistributions in binary form must reproduce the above copyright 27 - * notice, this list of conditions and the following disclaimer in 28 - * the documentation and/or other materials provided with the 29 - * distribution. 30 - * - Neither the name of Intel Corporation nor the names of its 31 - * contributors may be used to endorse or promote products derived 32 - * from this software without specific prior written permission. 33 - * 34 - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 35 - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 36 - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 37 - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 38 - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 39 - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 40 - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 41 - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 42 - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 43 - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 44 - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 45 - * 46 - */ 47 - 48 - /* 49 - * This file contains OPA VNIC encapsulation/decapsulation function. 50 - */ 51 - 52 - #include <linux/if_ether.h> 53 - #include <linux/if_vlan.h> 54 - 55 - #include "opa_vnic_internal.h" 56 - 57 - /* OPA 16B Header fields */ 58 - #define OPA_16B_LID_MASK 0xFFFFFull 59 - #define OPA_16B_SLID_HIGH_SHFT 8 60 - #define OPA_16B_SLID_MASK 0xF00ull 61 - #define OPA_16B_DLID_MASK 0xF000ull 62 - #define OPA_16B_DLID_HIGH_SHFT 12 63 - #define OPA_16B_LEN_SHFT 20 64 - #define OPA_16B_SC_SHFT 20 65 - #define OPA_16B_RC_SHFT 25 66 - #define OPA_16B_PKEY_SHFT 16 67 - 68 - #define OPA_VNIC_L4_HDR_SHFT 16 69 - 70 - /* L2+L4 hdr len is 20 bytes (5 quad words) */ 71 - #define OPA_VNIC_HDR_QW_LEN 5 72 - 73 - static inline void opa_vnic_make_header(u8 *hdr, u32 slid, u32 dlid, u16 len, 74 - u16 pkey, u16 entropy, u8 sc, u8 rc, 75 - u8 l4_type, u16 l4_hdr) 76 - { 77 - /* h[1]: LT=1, 16B L2=10 */ 78 - u32 h[OPA_VNIC_HDR_QW_LEN] = {0, 0xc0000000, 0, 0, 0}; 79 - 80 - h[2] = l4_type; 81 - h[3] = entropy; 82 - h[4] = l4_hdr << OPA_VNIC_L4_HDR_SHFT; 83 - 84 - /* Extract and set 4 upper bits and 20 lower bits of the lids */ 85 - h[0] |= (slid & OPA_16B_LID_MASK); 86 - h[2] |= ((slid >> (20 - OPA_16B_SLID_HIGH_SHFT)) & OPA_16B_SLID_MASK); 87 - 88 - h[1] |= (dlid & OPA_16B_LID_MASK); 89 - h[2] |= ((dlid >> (20 - OPA_16B_DLID_HIGH_SHFT)) & OPA_16B_DLID_MASK); 90 - 91 - h[0] |= (len << OPA_16B_LEN_SHFT); 92 - h[1] |= (rc << OPA_16B_RC_SHFT); 93 - h[1] |= (sc << OPA_16B_SC_SHFT); 94 - h[2] |= ((u32)pkey << OPA_16B_PKEY_SHFT); 95 - 96 - memcpy(hdr, h, OPA_VNIC_HDR_LEN); 97 - } 98 - 99 - /* 100 - * Using a simple hash table for mac table implementation with the last octet 101 - * of mac address as a key. 102 - */ 103 - static void opa_vnic_free_mac_tbl(struct hlist_head *mactbl) 104 - { 105 - struct opa_vnic_mac_tbl_node *node; 106 - struct hlist_node *tmp; 107 - int bkt; 108 - 109 - if (!mactbl) 110 - return; 111 - 112 - vnic_hash_for_each_safe(mactbl, bkt, tmp, node, hlist) { 113 - hash_del(&node->hlist); 114 - kfree(node); 115 - } 116 - kfree(mactbl); 117 - } 118 - 119 - static struct hlist_head *opa_vnic_alloc_mac_tbl(void) 120 - { 121 - u32 size = sizeof(struct hlist_head) * OPA_VNIC_MAC_TBL_SIZE; 122 - struct hlist_head *mactbl; 123 - 124 - mactbl = kzalloc(size, GFP_KERNEL); 125 - if (!mactbl) 126 - return ERR_PTR(-ENOMEM); 127 - 128 - vnic_hash_init(mactbl); 129 - return mactbl; 130 - } 131 - 132 - /* opa_vnic_release_mac_tbl - empty and free the mac table */ 133 - void opa_vnic_release_mac_tbl(struct opa_vnic_adapter *adapter) 134 - { 135 - struct hlist_head *mactbl; 136 - 137 - mutex_lock(&adapter->mactbl_lock); 138 - mactbl = rcu_access_pointer(adapter->mactbl); 139 - rcu_assign_pointer(adapter->mactbl, NULL); 140 - synchronize_rcu(); 141 - opa_vnic_free_mac_tbl(mactbl); 142 - adapter->info.vport.mac_tbl_digest = 0; 143 - mutex_unlock(&adapter->mactbl_lock); 144 - } 145 - 146 - /* 147 - * opa_vnic_query_mac_tbl - query the mac table for a section 148 - * 149 - * This function implements query of specific function of the mac table. 150 - * The function also expects the requested range to be valid. 151 - */ 152 - void opa_vnic_query_mac_tbl(struct opa_vnic_adapter *adapter, 153 - struct opa_veswport_mactable *tbl) 154 - { 155 - struct opa_vnic_mac_tbl_node *node; 156 - struct hlist_head *mactbl; 157 - int bkt; 158 - u16 loffset, lnum_entries; 159 - 160 - rcu_read_lock(); 161 - mactbl = rcu_dereference(adapter->mactbl); 162 - if (!mactbl) 163 - goto get_mac_done; 164 - 165 - loffset = be16_to_cpu(tbl->offset); 166 - lnum_entries = be16_to_cpu(tbl->num_entries); 167 - 168 - vnic_hash_for_each(mactbl, bkt, node, hlist) { 169 - struct __opa_vnic_mactable_entry *nentry = &node->entry; 170 - struct opa_veswport_mactable_entry *entry; 171 - 172 - if ((node->index < loffset) || 173 - (node->index >= (loffset + lnum_entries))) 174 - continue; 175 - 176 - /* populate entry in the tbl corresponding to the index */ 177 - entry = &tbl->tbl_entries[node->index - loffset]; 178 - memcpy(entry->mac_addr, nentry->mac_addr, 179 - ARRAY_SIZE(entry->mac_addr)); 180 - memcpy(entry->mac_addr_mask, nentry->mac_addr_mask, 181 - ARRAY_SIZE(entry->mac_addr_mask)); 182 - entry->dlid_sd = cpu_to_be32(nentry->dlid_sd); 183 - } 184 - tbl->mac_tbl_digest = cpu_to_be32(adapter->info.vport.mac_tbl_digest); 185 - get_mac_done: 186 - rcu_read_unlock(); 187 - } 188 - 189 - /* 190 - * opa_vnic_update_mac_tbl - update mac table section 191 - * 192 - * This function updates the specified section of the mac table. 193 - * The procedure includes following steps. 194 - * - Allocate a new mac (hash) table. 195 - * - Add the specified entries to the new table. 196 - * (except the ones that are requested to be deleted). 197 - * - Add all the other entries from the old mac table. 198 - * - If there is a failure, free the new table and return. 199 - * - Switch to the new table. 200 - * - Free the old table and return. 201 - * 202 - * The function also expects the requested range to be valid. 203 - */ 204 - int opa_vnic_update_mac_tbl(struct opa_vnic_adapter *adapter, 205 - struct opa_veswport_mactable *tbl) 206 - { 207 - struct opa_vnic_mac_tbl_node *node, *new_node; 208 - struct hlist_head *new_mactbl, *old_mactbl; 209 - int i, bkt, rc = 0; 210 - u8 key; 211 - u16 loffset, lnum_entries; 212 - 213 - mutex_lock(&adapter->mactbl_lock); 214 - /* allocate new mac table */ 215 - new_mactbl = opa_vnic_alloc_mac_tbl(); 216 - if (IS_ERR(new_mactbl)) { 217 - mutex_unlock(&adapter->mactbl_lock); 218 - return PTR_ERR(new_mactbl); 219 - } 220 - 221 - loffset = be16_to_cpu(tbl->offset); 222 - lnum_entries = be16_to_cpu(tbl->num_entries); 223 - 224 - /* add updated entries to the new mac table */ 225 - for (i = 0; i < lnum_entries; i++) { 226 - struct __opa_vnic_mactable_entry *nentry; 227 - struct opa_veswport_mactable_entry *entry = 228 - &tbl->tbl_entries[i]; 229 - u8 *mac_addr = entry->mac_addr; 230 - u8 empty_mac[ETH_ALEN] = { 0 }; 231 - 232 - v_dbg("new mac entry %4d: %02x:%02x:%02x:%02x:%02x:%02x %x\n", 233 - loffset + i, mac_addr[0], mac_addr[1], mac_addr[2], 234 - mac_addr[3], mac_addr[4], mac_addr[5], 235 - entry->dlid_sd); 236 - 237 - /* if the entry is being removed, do not add it */ 238 - if (!memcmp(mac_addr, empty_mac, ARRAY_SIZE(empty_mac))) 239 - continue; 240 - 241 - node = kzalloc_obj(*node); 242 - if (!node) { 243 - rc = -ENOMEM; 244 - goto updt_done; 245 - } 246 - 247 - node->index = loffset + i; 248 - nentry = &node->entry; 249 - memcpy(nentry->mac_addr, entry->mac_addr, 250 - ARRAY_SIZE(nentry->mac_addr)); 251 - memcpy(nentry->mac_addr_mask, entry->mac_addr_mask, 252 - ARRAY_SIZE(nentry->mac_addr_mask)); 253 - nentry->dlid_sd = be32_to_cpu(entry->dlid_sd); 254 - key = node->entry.mac_addr[OPA_VNIC_MAC_HASH_IDX]; 255 - vnic_hash_add(new_mactbl, &node->hlist, key); 256 - } 257 - 258 - /* add other entries from current mac table to new mac table */ 259 - old_mactbl = rcu_access_pointer(adapter->mactbl); 260 - if (!old_mactbl) 261 - goto switch_tbl; 262 - 263 - vnic_hash_for_each(old_mactbl, bkt, node, hlist) { 264 - if ((node->index >= loffset) && 265 - (node->index < (loffset + lnum_entries))) 266 - continue; 267 - 268 - new_node = kzalloc_obj(*new_node); 269 - if (!new_node) { 270 - rc = -ENOMEM; 271 - goto updt_done; 272 - } 273 - 274 - new_node->index = node->index; 275 - memcpy(&new_node->entry, &node->entry, sizeof(node->entry)); 276 - key = new_node->entry.mac_addr[OPA_VNIC_MAC_HASH_IDX]; 277 - vnic_hash_add(new_mactbl, &new_node->hlist, key); 278 - } 279 - 280 - switch_tbl: 281 - /* switch to new table */ 282 - rcu_assign_pointer(adapter->mactbl, new_mactbl); 283 - synchronize_rcu(); 284 - 285 - adapter->info.vport.mac_tbl_digest = be32_to_cpu(tbl->mac_tbl_digest); 286 - updt_done: 287 - /* upon failure, free the new table; otherwise, free the old table */ 288 - if (rc) 289 - opa_vnic_free_mac_tbl(new_mactbl); 290 - else 291 - opa_vnic_free_mac_tbl(old_mactbl); 292 - 293 - mutex_unlock(&adapter->mactbl_lock); 294 - return rc; 295 - } 296 - 297 - /* opa_vnic_chk_mac_tbl - check mac table for dlid */ 298 - static uint32_t opa_vnic_chk_mac_tbl(struct opa_vnic_adapter *adapter, 299 - struct ethhdr *mac_hdr) 300 - { 301 - struct opa_vnic_mac_tbl_node *node; 302 - struct hlist_head *mactbl; 303 - u32 dlid = 0; 304 - u8 key; 305 - 306 - rcu_read_lock(); 307 - mactbl = rcu_dereference(adapter->mactbl); 308 - if (unlikely(!mactbl)) 309 - goto chk_done; 310 - 311 - key = mac_hdr->h_dest[OPA_VNIC_MAC_HASH_IDX]; 312 - vnic_hash_for_each_possible(mactbl, node, hlist, key) { 313 - struct __opa_vnic_mactable_entry *entry = &node->entry; 314 - 315 - /* if related to source mac, skip */ 316 - if (unlikely(OPA_VNIC_DLID_SD_IS_SRC_MAC(entry->dlid_sd))) 317 - continue; 318 - 319 - if (!memcmp(node->entry.mac_addr, mac_hdr->h_dest, 320 - ARRAY_SIZE(node->entry.mac_addr))) { 321 - /* mac address found */ 322 - dlid = OPA_VNIC_DLID_SD_GET_DLID(node->entry.dlid_sd); 323 - break; 324 - } 325 - } 326 - 327 - chk_done: 328 - rcu_read_unlock(); 329 - return dlid; 330 - } 331 - 332 - /* opa_vnic_get_dlid - find and return the DLID */ 333 - static uint32_t opa_vnic_get_dlid(struct opa_vnic_adapter *adapter, 334 - struct sk_buff *skb, u8 def_port) 335 - { 336 - struct __opa_veswport_info *info = &adapter->info; 337 - struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb); 338 - u32 dlid; 339 - 340 - dlid = opa_vnic_chk_mac_tbl(adapter, mac_hdr); 341 - if (dlid) 342 - return dlid; 343 - 344 - if (is_multicast_ether_addr(mac_hdr->h_dest)) { 345 - dlid = info->vesw.u_mcast_dlid; 346 - } else { 347 - if (is_local_ether_addr(mac_hdr->h_dest)) { 348 - dlid = ((uint32_t)mac_hdr->h_dest[5] << 16) | 349 - ((uint32_t)mac_hdr->h_dest[4] << 8) | 350 - mac_hdr->h_dest[3]; 351 - if (unlikely(!dlid)) 352 - v_warn("Null dlid in MAC address\n"); 353 - } else if (def_port != OPA_VNIC_INVALID_PORT) { 354 - if (def_port < OPA_VESW_MAX_NUM_DEF_PORT) 355 - dlid = info->vesw.u_ucast_dlid[def_port]; 356 - } 357 - } 358 - 359 - return dlid; 360 - } 361 - 362 - /* opa_vnic_get_sc - return the service class */ 363 - static u8 opa_vnic_get_sc(struct __opa_veswport_info *info, 364 - struct sk_buff *skb) 365 - { 366 - struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb); 367 - u16 vlan_tci; 368 - u8 sc; 369 - 370 - if (!__vlan_get_tag(skb, &vlan_tci)) { 371 - u8 pcp = OPA_VNIC_VLAN_PCP(vlan_tci); 372 - 373 - if (is_multicast_ether_addr(mac_hdr->h_dest)) 374 - sc = info->vport.pcp_to_sc_mc[pcp]; 375 - else 376 - sc = info->vport.pcp_to_sc_uc[pcp]; 377 - } else { 378 - if (is_multicast_ether_addr(mac_hdr->h_dest)) 379 - sc = info->vport.non_vlan_sc_mc; 380 - else 381 - sc = info->vport.non_vlan_sc_uc; 382 - } 383 - 384 - return sc; 385 - } 386 - 387 - u8 opa_vnic_get_vl(struct opa_vnic_adapter *adapter, struct sk_buff *skb) 388 - { 389 - struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb); 390 - struct __opa_veswport_info *info = &adapter->info; 391 - u8 vl; 392 - 393 - if (skb_vlan_tag_present(skb)) { 394 - u8 pcp = skb_vlan_tag_get(skb) >> VLAN_PRIO_SHIFT; 395 - 396 - if (is_multicast_ether_addr(mac_hdr->h_dest)) 397 - vl = info->vport.pcp_to_vl_mc[pcp]; 398 - else 399 - vl = info->vport.pcp_to_vl_uc[pcp]; 400 - } else { 401 - if (is_multicast_ether_addr(mac_hdr->h_dest)) 402 - vl = info->vport.non_vlan_vl_mc; 403 - else 404 - vl = info->vport.non_vlan_vl_uc; 405 - } 406 - 407 - return vl; 408 - } 409 - 410 - /* opa_vnic_get_rc - return the routing control */ 411 - static u8 opa_vnic_get_rc(struct __opa_veswport_info *info, 412 - struct sk_buff *skb) 413 - { 414 - u8 proto, rout_ctrl; 415 - 416 - switch (vlan_get_protocol(skb)) { 417 - case htons(ETH_P_IPV6): 418 - proto = ipv6_hdr(skb)->nexthdr; 419 - if (proto == IPPROTO_TCP) 420 - rout_ctrl = OPA_VNIC_ENCAP_RC_EXT(info->vesw.rc, 421 - IPV6_TCP); 422 - else if (proto == IPPROTO_UDP) 423 - rout_ctrl = OPA_VNIC_ENCAP_RC_EXT(info->vesw.rc, 424 - IPV6_UDP); 425 - else 426 - rout_ctrl = OPA_VNIC_ENCAP_RC_EXT(info->vesw.rc, IPV6); 427 - break; 428 - case htons(ETH_P_IP): 429 - proto = ip_hdr(skb)->protocol; 430 - if (proto == IPPROTO_TCP) 431 - rout_ctrl = OPA_VNIC_ENCAP_RC_EXT(info->vesw.rc, 432 - IPV4_TCP); 433 - else if (proto == IPPROTO_UDP) 434 - rout_ctrl = OPA_VNIC_ENCAP_RC_EXT(info->vesw.rc, 435 - IPV4_UDP); 436 - else 437 - rout_ctrl = OPA_VNIC_ENCAP_RC_EXT(info->vesw.rc, IPV4); 438 - break; 439 - default: 440 - rout_ctrl = OPA_VNIC_ENCAP_RC_EXT(info->vesw.rc, DEFAULT); 441 - } 442 - 443 - return rout_ctrl; 444 - } 445 - 446 - /* opa_vnic_calc_entropy - calculate the packet entropy */ 447 - u8 opa_vnic_calc_entropy(struct sk_buff *skb) 448 - { 449 - u32 hash = skb_get_hash(skb); 450 - 451 - /* store XOR of all bytes in lower 8 bits */ 452 - hash ^= hash >> 8; 453 - hash ^= hash >> 16; 454 - 455 - /* return lower 8 bits as entropy */ 456 - return (u8)(hash & 0xFF); 457 - } 458 - 459 - /* opa_vnic_get_def_port - get default port based on entropy */ 460 - static inline u8 opa_vnic_get_def_port(struct opa_vnic_adapter *adapter, 461 - u8 entropy) 462 - { 463 - u8 flow_id; 464 - 465 - /* Add the upper and lower 4-bits of entropy to get the flow id */ 466 - flow_id = ((entropy & 0xf) + (entropy >> 4)); 467 - return adapter->flow_tbl[flow_id & (OPA_VNIC_FLOW_TBL_SIZE - 1)]; 468 - } 469 - 470 - /* Calculate packet length including OPA header, crc and padding */ 471 - static inline int opa_vnic_wire_length(struct sk_buff *skb) 472 - { 473 - u32 pad_len; 474 - 475 - /* padding for 8 bytes size alignment */ 476 - pad_len = -(skb->len + OPA_VNIC_ICRC_TAIL_LEN) & 0x7; 477 - pad_len += OPA_VNIC_ICRC_TAIL_LEN; 478 - 479 - return (skb->len + pad_len) >> 3; 480 - } 481 - 482 - /* opa_vnic_encap_skb - encapsulate skb packet with OPA header and meta data */ 483 - void opa_vnic_encap_skb(struct opa_vnic_adapter *adapter, struct sk_buff *skb) 484 - { 485 - struct __opa_veswport_info *info = &adapter->info; 486 - struct opa_vnic_skb_mdata *mdata; 487 - u8 def_port, sc, rc, entropy, *hdr; 488 - u16 len, l4_hdr; 489 - u32 dlid; 490 - 491 - hdr = skb_push(skb, OPA_VNIC_HDR_LEN); 492 - 493 - entropy = opa_vnic_calc_entropy(skb); 494 - def_port = opa_vnic_get_def_port(adapter, entropy); 495 - len = opa_vnic_wire_length(skb); 496 - dlid = opa_vnic_get_dlid(adapter, skb, def_port); 497 - sc = opa_vnic_get_sc(info, skb); 498 - rc = opa_vnic_get_rc(info, skb); 499 - l4_hdr = info->vesw.vesw_id; 500 - 501 - mdata = skb_push(skb, sizeof(*mdata)); 502 - mdata->vl = opa_vnic_get_vl(adapter, skb); 503 - mdata->entropy = entropy; 504 - mdata->flags = 0; 505 - if (unlikely(!dlid)) { 506 - mdata->flags = OPA_VNIC_SKB_MDATA_ENCAP_ERR; 507 - return; 508 - } 509 - 510 - opa_vnic_make_header(hdr, info->vport.encap_slid, dlid, len, 511 - info->vesw.pkey, entropy, sc, rc, 512 - OPA_VNIC_L4_ETHR, l4_hdr); 513 - }

-524

drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h

··· 1 - #ifndef _OPA_VNIC_ENCAP_H 2 - #define _OPA_VNIC_ENCAP_H 3 - /* 4 - * Copyright(c) 2017 Intel Corporation. 5 - * 6 - * This file is provided under a dual BSD/GPLv2 license. When using or 7 - * redistributing this file, you may do so under either license. 8 - * 9 - * GPL LICENSE SUMMARY 10 - * 11 - * This program is free software; you can redistribute it and/or modify 12 - * it under the terms of version 2 of the GNU General Public License as 13 - * published by the Free Software Foundation. 14 - * 15 - * This program is distributed in the hope that it will be useful, but 16 - * WITHOUT ANY WARRANTY; without even the implied warranty of 17 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 18 - * General Public License for more details. 19 - * 20 - * BSD LICENSE 21 - * 22 - * Redistribution and use in source and binary forms, with or without 23 - * modification, are permitted provided that the following conditions 24 - * are met: 25 - * 26 - * - Redistributions of source code must retain the above copyright 27 - * notice, this list of conditions and the following disclaimer. 28 - * - Redistributions in binary form must reproduce the above copyright 29 - * notice, this list of conditions and the following disclaimer in 30 - * the documentation and/or other materials provided with the 31 - * distribution. 32 - * - Neither the name of Intel Corporation nor the names of its 33 - * contributors may be used to endorse or promote products derived 34 - * from this software without specific prior written permission. 35 - * 36 - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 37 - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 38 - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 39 - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 40 - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 41 - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 42 - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 43 - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 44 - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 45 - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 46 - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 47 - * 48 - */ 49 - 50 - /* 51 - * This file contains all OPA VNIC declaration required for encapsulation 52 - * and decapsulation of Ethernet packets 53 - */ 54 - 55 - #include <linux/types.h> 56 - #include <rdma/ib_mad.h> 57 - 58 - /* EMA class version */ 59 - #define OPA_EMA_CLASS_VERSION 0x80 60 - 61 - /* 62 - * Define the Intel vendor management class for OPA 63 - * ETHERNET MANAGEMENT 64 - */ 65 - #define OPA_MGMT_CLASS_INTEL_EMA 0x34 66 - 67 - /* EM attribute IDs */ 68 - #define OPA_EM_ATTR_CLASS_PORT_INFO 0x0001 69 - #define OPA_EM_ATTR_VESWPORT_INFO 0x0011 70 - #define OPA_EM_ATTR_VESWPORT_MAC_ENTRIES 0x0012 71 - #define OPA_EM_ATTR_IFACE_UCAST_MACS 0x0013 72 - #define OPA_EM_ATTR_IFACE_MCAST_MACS 0x0014 73 - #define OPA_EM_ATTR_DELETE_VESW 0x0015 74 - #define OPA_EM_ATTR_VESWPORT_SUMMARY_COUNTERS 0x0020 75 - #define OPA_EM_ATTR_VESWPORT_ERROR_COUNTERS 0x0022 76 - 77 - /* VNIC configured and operational state values */ 78 - #define OPA_VNIC_STATE_DROP_ALL 0x1 79 - #define OPA_VNIC_STATE_FORWARDING 0x3 80 - 81 - #define OPA_VESW_MAX_NUM_DEF_PORT 16 82 - #define OPA_VNIC_MAX_NUM_PCP 8 83 - 84 - #define OPA_VNIC_EMA_DATA (OPA_MGMT_MAD_SIZE - IB_MGMT_VENDOR_HDR) 85 - 86 - /* Defines for vendor specific notice(trap) attributes */ 87 - #define OPA_INTEL_EMA_NOTICE_TYPE_INFO 0x04 88 - 89 - /* INTEL OUI */ 90 - #define INTEL_OUI_1 0x00 91 - #define INTEL_OUI_2 0x06 92 - #define INTEL_OUI_3 0x6a 93 - 94 - /* Trap opcodes sent from VNIC */ 95 - #define OPA_VESWPORT_TRAP_IFACE_UCAST_MAC_CHANGE 0x1 96 - #define OPA_VESWPORT_TRAP_IFACE_MCAST_MAC_CHANGE 0x2 97 - #define OPA_VESWPORT_TRAP_ETH_LINK_STATUS_CHANGE 0x3 98 - 99 - #define OPA_VNIC_DLID_SD_IS_SRC_MAC(dlid_sd) (!!((dlid_sd) & 0x20)) 100 - #define OPA_VNIC_DLID_SD_GET_DLID(dlid_sd) ((dlid_sd) >> 8) 101 - 102 - /* VNIC Ethernet link status */ 103 - #define OPA_VNIC_ETH_LINK_UP 1 104 - #define OPA_VNIC_ETH_LINK_DOWN 2 105 - 106 - /* routing control */ 107 - #define OPA_VNIC_ENCAP_RC_DEFAULT 0 108 - #define OPA_VNIC_ENCAP_RC_IPV4 4 109 - #define OPA_VNIC_ENCAP_RC_IPV4_UDP 8 110 - #define OPA_VNIC_ENCAP_RC_IPV4_TCP 12 111 - #define OPA_VNIC_ENCAP_RC_IPV6 16 112 - #define OPA_VNIC_ENCAP_RC_IPV6_TCP 20 113 - #define OPA_VNIC_ENCAP_RC_IPV6_UDP 24 114 - 115 - #define OPA_VNIC_ENCAP_RC_EXT(w, b) (((w) >> OPA_VNIC_ENCAP_RC_ ## b) & 0x7) 116 - 117 - /** 118 - * struct opa_vesw_info - OPA vnic switch information 119 - * @fabric_id: 10-bit fabric id 120 - * @vesw_id: 12-bit virtual ethernet switch id 121 - * @rsvd0: reserved bytes 122 - * @def_port_mask: bitmask of default ports 123 - * @rsvd1: reserved bytes 124 - * @pkey: partition key 125 - * @rsvd2: reserved bytes 126 - * @u_mcast_dlid: unknown multicast dlid 127 - * @u_ucast_dlid: array of unknown unicast dlids 128 - * @rsvd3: reserved bytes 129 - * @rc: routing control 130 - * @eth_mtu: Ethernet MTU 131 - * @rsvd4: reserved bytes 132 - */ 133 - struct opa_vesw_info { 134 - __be16 fabric_id; 135 - __be16 vesw_id; 136 - 137 - u8 rsvd0[6]; 138 - __be16 def_port_mask; 139 - 140 - u8 rsvd1[2]; 141 - __be16 pkey; 142 - 143 - u8 rsvd2[4]; 144 - __be32 u_mcast_dlid; 145 - __be32 u_ucast_dlid[OPA_VESW_MAX_NUM_DEF_PORT]; 146 - 147 - __be32 rc; 148 - 149 - u8 rsvd3[56]; 150 - __be16 eth_mtu; 151 - u8 rsvd4[2]; 152 - } __packed; 153 - 154 - /** 155 - * struct opa_per_veswport_info - OPA vnic per port information 156 - * @port_num: port number 157 - * @eth_link_status: current ethernet link state 158 - * @rsvd0: reserved bytes 159 - * @base_mac_addr: base mac address 160 - * @config_state: configured port state 161 - * @oper_state: operational port state 162 - * @max_mac_tbl_ent: max number of mac table entries 163 - * @max_smac_ent: max smac entries in mac table 164 - * @mac_tbl_digest: mac table digest 165 - * @rsvd1: reserved bytes 166 - * @encap_slid: base slid for the port 167 - * @pcp_to_sc_uc: sc by pcp index for unicast ethernet packets 168 - * @pcp_to_vl_uc: vl by pcp index for unicast ethernet packets 169 - * @pcp_to_sc_mc: sc by pcp index for multicast ethernet packets 170 - * @pcp_to_vl_mc: vl by pcp index for multicast ethernet packets 171 - * @non_vlan_sc_uc: sc for non-vlan unicast ethernet packets 172 - * @non_vlan_vl_uc: vl for non-vlan unicast ethernet packets 173 - * @non_vlan_sc_mc: sc for non-vlan multicast ethernet packets 174 - * @non_vlan_vl_mc: vl for non-vlan multicast ethernet packets 175 - * @rsvd2: reserved bytes 176 - * @uc_macs_gen_count: generation count for unicast macs list 177 - * @mc_macs_gen_count: generation count for multicast macs list 178 - * @rsvd3: reserved bytes 179 - */ 180 - struct opa_per_veswport_info { 181 - __be32 port_num; 182 - 183 - u8 eth_link_status; 184 - u8 rsvd0[3]; 185 - 186 - u8 base_mac_addr[ETH_ALEN]; 187 - u8 config_state; 188 - u8 oper_state; 189 - 190 - __be16 max_mac_tbl_ent; 191 - __be16 max_smac_ent; 192 - __be32 mac_tbl_digest; 193 - u8 rsvd1[4]; 194 - 195 - __be32 encap_slid; 196 - 197 - u8 pcp_to_sc_uc[OPA_VNIC_MAX_NUM_PCP]; 198 - u8 pcp_to_vl_uc[OPA_VNIC_MAX_NUM_PCP]; 199 - u8 pcp_to_sc_mc[OPA_VNIC_MAX_NUM_PCP]; 200 - u8 pcp_to_vl_mc[OPA_VNIC_MAX_NUM_PCP]; 201 - 202 - u8 non_vlan_sc_uc; 203 - u8 non_vlan_vl_uc; 204 - u8 non_vlan_sc_mc; 205 - u8 non_vlan_vl_mc; 206 - 207 - u8 rsvd2[48]; 208 - 209 - __be16 uc_macs_gen_count; 210 - __be16 mc_macs_gen_count; 211 - 212 - u8 rsvd3[8]; 213 - } __packed; 214 - 215 - /** 216 - * struct opa_veswport_info - OPA vnic port information 217 - * @vesw: OPA vnic switch information 218 - * @vport: OPA vnic per port information 219 - * 220 - * On host, each of the virtual ethernet ports belongs 221 - * to a different virtual ethernet switches. 222 - */ 223 - struct opa_veswport_info { 224 - struct opa_vesw_info vesw; 225 - struct opa_per_veswport_info vport; 226 - }; 227 - 228 - /** 229 - * struct opa_veswport_mactable_entry - single entry in the forwarding table 230 - * @mac_addr: MAC address 231 - * @mac_addr_mask: MAC address bit mask 232 - * @dlid_sd: Matching DLID and side data 233 - * 234 - * On the host each virtual ethernet port will have 235 - * a forwarding table. These tables are used to 236 - * map a MAC to a LID and other data. For more 237 - * details see struct opa_veswport_mactable_entries. 238 - * This is the structure of a single mactable entry 239 - */ 240 - struct opa_veswport_mactable_entry { 241 - u8 mac_addr[ETH_ALEN]; 242 - u8 mac_addr_mask[ETH_ALEN]; 243 - __be32 dlid_sd; 244 - } __packed; 245 - 246 - /** 247 - * struct opa_veswport_mactable - Forwarding table array 248 - * @offset: mac table starting offset 249 - * @num_entries: Number of entries to get or set 250 - * @mac_tbl_digest: mac table digest 251 - * @tbl_entries: Array of table entries 252 - * 253 - * The EM sends down this structure in a MAD indicating 254 - * the starting offset in the forwarding table that this 255 - * entry is to be loaded into and the number of entries 256 - * that that this MAD instance contains 257 - * The mac_tbl_digest has been added to this MAD structure. It will be set by 258 - * the EM and it will be used by the EM to check if there are any 259 - * discrepancies with this value and the value 260 - * maintained by the EM in the case of VNIC port being deleted or unloaded 261 - * A new instantiation of a VNIC will always have a value of zero. 262 - * This value is stored as part of the vnic adapter structure and will be 263 - * accessed by the GET and SET routines for both the mactable entries and the 264 - * veswport info. 265 - */ 266 - struct opa_veswport_mactable { 267 - __be16 offset; 268 - __be16 num_entries; 269 - __be32 mac_tbl_digest; 270 - struct opa_veswport_mactable_entry tbl_entries[]; 271 - } __packed; 272 - 273 - /** 274 - * struct opa_veswport_summary_counters - summary counters 275 - * @vp_instance: vport instance on the OPA port 276 - * @vesw_id: virtual ethernet switch id 277 - * @veswport_num: virtual ethernet switch port number 278 - * @tx_errors: transmit errors 279 - * @rx_errors: receive errors 280 - * @tx_packets: transmit packets 281 - * @rx_packets: receive packets 282 - * @tx_bytes: transmit bytes 283 - * @rx_bytes: receive bytes 284 - * @tx_unicast: unicast packets transmitted 285 - * @tx_mcastbcast: multicast/broadcast packets transmitted 286 - * @tx_untagged: non-vlan packets transmitted 287 - * @tx_vlan: vlan packets transmitted 288 - * @tx_64_size: transmit packet length is 64 bytes 289 - * @tx_65_127: transmit packet length is >=65 and < 127 bytes 290 - * @tx_128_255: transmit packet length is >=128 and < 255 bytes 291 - * @tx_256_511: transmit packet length is >=256 and < 511 bytes 292 - * @tx_512_1023: transmit packet length is >=512 and < 1023 bytes 293 - * @tx_1024_1518: transmit packet length is >=1024 and < 1518 bytes 294 - * @tx_1519_max: transmit packet length >= 1519 bytes 295 - * @rx_unicast: unicast packets received 296 - * @rx_mcastbcast: multicast/broadcast packets received 297 - * @rx_untagged: non-vlan packets received 298 - * @rx_vlan: vlan packets received 299 - * @rx_64_size: received packet length is 64 bytes 300 - * @rx_65_127: received packet length is >=65 and < 127 bytes 301 - * @rx_128_255: received packet length is >=128 and < 255 bytes 302 - * @rx_256_511: received packet length is >=256 and < 511 bytes 303 - * @rx_512_1023: received packet length is >=512 and < 1023 bytes 304 - * @rx_1024_1518: received packet length is >=1024 and < 1518 bytes 305 - * @rx_1519_max: received packet length >= 1519 bytes 306 - * @reserved: reserved bytes 307 - * 308 - * All the above are counters of corresponding conditions. 309 - */ 310 - struct opa_veswport_summary_counters { 311 - __be16 vp_instance; 312 - __be16 vesw_id; 313 - __be32 veswport_num; 314 - 315 - __be64 tx_errors; 316 - __be64 rx_errors; 317 - __be64 tx_packets; 318 - __be64 rx_packets; 319 - __be64 tx_bytes; 320 - __be64 rx_bytes; 321 - 322 - __be64 tx_unicast; 323 - __be64 tx_mcastbcast; 324 - 325 - __be64 tx_untagged; 326 - __be64 tx_vlan; 327 - 328 - __be64 tx_64_size; 329 - __be64 tx_65_127; 330 - __be64 tx_128_255; 331 - __be64 tx_256_511; 332 - __be64 tx_512_1023; 333 - __be64 tx_1024_1518; 334 - __be64 tx_1519_max; 335 - 336 - __be64 rx_unicast; 337 - __be64 rx_mcastbcast; 338 - 339 - __be64 rx_untagged; 340 - __be64 rx_vlan; 341 - 342 - __be64 rx_64_size; 343 - __be64 rx_65_127; 344 - __be64 rx_128_255; 345 - __be64 rx_256_511; 346 - __be64 rx_512_1023; 347 - __be64 rx_1024_1518; 348 - __be64 rx_1519_max; 349 - 350 - __be64 reserved[16]; 351 - } __packed; 352 - 353 - /** 354 - * struct opa_veswport_error_counters - error counters 355 - * @vp_instance: vport instance on the OPA port 356 - * @vesw_id: virtual ethernet switch id 357 - * @veswport_num: virtual ethernet switch port number 358 - * @tx_errors: transmit errors 359 - * @rx_errors: receive errors 360 - * @rsvd0: reserved bytes 361 - * @tx_smac_filt: smac filter errors 362 - * @rsvd1: reserved bytes 363 - * @rsvd2: reserved bytes 364 - * @rsvd3: reserved bytes 365 - * @tx_dlid_zero: transmit packets with invalid dlid 366 - * @rsvd4: reserved bytes 367 - * @tx_logic: other transmit errors 368 - * @rsvd5: reserved bytes 369 - * @tx_drop_state: packet tansmission in non-forward port state 370 - * @rx_bad_veswid: received packet with invalid vesw id 371 - * @rsvd6: reserved bytes 372 - * @rx_runt: received ethernet packet with length < 64 bytes 373 - * @rx_oversize: received ethernet packet with length > MTU size 374 - * @rsvd7: reserved bytes 375 - * @rx_eth_down: received packets when interface is down 376 - * @rx_drop_state: received packets in non-forwarding port state 377 - * @rx_logic: other receive errors 378 - * @rsvd8: reserved bytes 379 - * @rsvd9: reserved bytes 380 - * 381 - * All the above are counters of corresponding error conditions. 382 - */ 383 - struct opa_veswport_error_counters { 384 - __be16 vp_instance; 385 - __be16 vesw_id; 386 - __be32 veswport_num; 387 - 388 - __be64 tx_errors; 389 - __be64 rx_errors; 390 - 391 - __be64 rsvd0; 392 - __be64 tx_smac_filt; 393 - __be64 rsvd1; 394 - __be64 rsvd2; 395 - __be64 rsvd3; 396 - __be64 tx_dlid_zero; 397 - __be64 rsvd4; 398 - __be64 tx_logic; 399 - __be64 rsvd5; 400 - __be64 tx_drop_state; 401 - 402 - __be64 rx_bad_veswid; 403 - __be64 rsvd6; 404 - __be64 rx_runt; 405 - __be64 rx_oversize; 406 - __be64 rsvd7; 407 - __be64 rx_eth_down; 408 - __be64 rx_drop_state; 409 - __be64 rx_logic; 410 - __be64 rsvd8; 411 - 412 - __be64 rsvd9[16]; 413 - } __packed; 414 - 415 - /** 416 - * struct opa_veswport_trap - Trap message sent to EM by VNIC 417 - * @fabric_id: 10 bit fabric id 418 - * @veswid: 12 bit virtual ethernet switch id 419 - * @veswportnum: logical port number on the Virtual switch 420 - * @opaportnum: physical port num (redundant on host) 421 - * @veswportindex: switch port index on opa port 0 based 422 - * @opcode: operation 423 - * @reserved: 32 bit for alignment 424 - * 425 - * The VNIC will send trap messages to the Ethernet manager to 426 - * inform it about changes to the VNIC config, behaviour etc. 427 - * This is the format of the trap payload. 428 - */ 429 - struct opa_veswport_trap { 430 - __be16 fabric_id; 431 - __be16 veswid; 432 - __be32 veswportnum; 433 - __be16 opaportnum; 434 - u8 veswportindex; 435 - u8 opcode; 436 - __be32 reserved; 437 - } __packed; 438 - 439 - /** 440 - * struct opa_vnic_iface_mac_entry - single entry in the mac list 441 - * @mac_addr: MAC address 442 - */ 443 - struct opa_vnic_iface_mac_entry { 444 - u8 mac_addr[ETH_ALEN]; 445 - }; 446 - 447 - /** 448 - * struct opa_veswport_iface_macs - Msg to set globally administered MAC 449 - * @start_idx: position of first entry (0 based) 450 - * @num_macs_in_msg: number of MACs in this message 451 - * @tot_macs_in_lst: The total number of MACs the agent has 452 - * @gen_count: gen_count to indicate change 453 - * @entry: The mac list entry 454 - * 455 - * Same attribute IDS and attribute modifiers as in locally administered 456 - * addresses used to set globally administered addresses 457 - */ 458 - struct opa_veswport_iface_macs { 459 - __be16 start_idx; 460 - __be16 num_macs_in_msg; 461 - __be16 tot_macs_in_lst; 462 - __be16 gen_count; 463 - struct opa_vnic_iface_mac_entry entry[]; 464 - } __packed; 465 - 466 - /** 467 - * struct opa_vnic_vema_mad - Generic VEMA MAD 468 - * @mad_hdr: Generic MAD header 469 - * @rmpp_hdr: RMPP header for vendor specific MADs 470 - * @reserved: reserved bytes 471 - * @oui: Unique org identifier 472 - * @data: MAD data 473 - */ 474 - struct opa_vnic_vema_mad { 475 - struct ib_mad_hdr mad_hdr; 476 - struct ib_rmpp_hdr rmpp_hdr; 477 - u8 reserved; 478 - u8 oui[3]; 479 - u8 data[OPA_VNIC_EMA_DATA]; 480 - }; 481 - 482 - /** 483 - * struct opa_vnic_notice_attr - Generic Notice MAD 484 - * @gen_type: Generic/Specific bit and type of notice 485 - * @oui_1: Vendor ID byte 1 486 - * @oui_2: Vendor ID byte 2 487 - * @oui_3: Vendor ID byte 3 488 - * @trap_num: Trap number 489 - * @toggle_count: Notice toggle bit and count value 490 - * @issuer_lid: Trap issuer's lid 491 - * @reserved: reserved bytes 492 - * @issuer_gid: Issuer GID (only if Report method) 493 - * @raw_data: Trap message body 494 - */ 495 - struct opa_vnic_notice_attr { 496 - u8 gen_type; 497 - u8 oui_1; 498 - u8 oui_2; 499 - u8 oui_3; 500 - __be16 trap_num; 501 - __be16 toggle_count; 502 - __be32 issuer_lid; 503 - __be32 reserved; 504 - u8 issuer_gid[16]; 505 - u8 raw_data[64]; 506 - } __packed; 507 - 508 - /** 509 - * struct opa_vnic_vema_mad_trap - Generic VEMA MAD Trap 510 - * @mad_hdr: Generic MAD header 511 - * @rmpp_hdr: RMPP header for vendor specific MADs 512 - * @reserved: reserved bytes 513 - * @oui: Unique org identifier 514 - * @notice: Notice structure 515 - */ 516 - struct opa_vnic_vema_mad_trap { 517 - struct ib_mad_hdr mad_hdr; 518 - struct ib_rmpp_hdr rmpp_hdr; 519 - u8 reserved; 520 - u8 oui[3]; 521 - struct opa_vnic_notice_attr notice; 522 - }; 523 - 524 - #endif /* _OPA_VNIC_ENCAP_H */

-183

drivers/infiniband/ulp/opa_vnic/opa_vnic_ethtool.c

··· 1 - /* 2 - * Copyright(c) 2017 Intel Corporation. 3 - * 4 - * This file is provided under a dual BSD/GPLv2 license. When using or 5 - * redistributing this file, you may do so under either license. 6 - * 7 - * GPL LICENSE SUMMARY 8 - * 9 - * This program is free software; you can redistribute it and/or modify 10 - * it under the terms of version 2 of the GNU General Public License as 11 - * published by the Free Software Foundation. 12 - * 13 - * This program is distributed in the hope that it will be useful, but 14 - * WITHOUT ANY WARRANTY; without even the implied warranty of 15 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 16 - * General Public License for more details. 17 - * 18 - * BSD LICENSE 19 - * 20 - * Redistribution and use in source and binary forms, with or without 21 - * modification, are permitted provided that the following conditions 22 - * are met: 23 - * 24 - * - Redistributions of source code must retain the above copyright 25 - * notice, this list of conditions and the following disclaimer. 26 - * - Redistributions in binary form must reproduce the above copyright 27 - * notice, this list of conditions and the following disclaimer in 28 - * the documentation and/or other materials provided with the 29 - * distribution. 30 - * - Neither the name of Intel Corporation nor the names of its 31 - * contributors may be used to endorse or promote products derived 32 - * from this software without specific prior written permission. 33 - * 34 - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 35 - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 36 - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 37 - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 38 - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 39 - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 40 - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 41 - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 42 - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 43 - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 44 - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 45 - * 46 - */ 47 - 48 - /* 49 - * This file contains OPA VNIC ethtool functions 50 - */ 51 - 52 - #include <linux/ethtool.h> 53 - 54 - #include "opa_vnic_internal.h" 55 - 56 - enum {NETDEV_STATS, VNIC_STATS}; 57 - 58 - struct vnic_stats { 59 - char stat_string[ETH_GSTRING_LEN]; 60 - struct { 61 - int sizeof_stat; 62 - int stat_offset; 63 - }; 64 - }; 65 - 66 - #define VNIC_STAT(m) { sizeof_field(struct opa_vnic_stats, m), \ 67 - offsetof(struct opa_vnic_stats, m) } 68 - 69 - static struct vnic_stats vnic_gstrings_stats[] = { 70 - /* NETDEV stats */ 71 - {"rx_packets", VNIC_STAT(netstats.rx_packets)}, 72 - {"tx_packets", VNIC_STAT(netstats.tx_packets)}, 73 - {"rx_bytes", VNIC_STAT(netstats.rx_bytes)}, 74 - {"tx_bytes", VNIC_STAT(netstats.tx_bytes)}, 75 - {"rx_errors", VNIC_STAT(netstats.rx_errors)}, 76 - {"tx_errors", VNIC_STAT(netstats.tx_errors)}, 77 - {"rx_dropped", VNIC_STAT(netstats.rx_dropped)}, 78 - {"tx_dropped", VNIC_STAT(netstats.tx_dropped)}, 79 - 80 - /* SUMMARY counters */ 81 - {"tx_unicast", VNIC_STAT(tx_grp.unicast)}, 82 - {"tx_mcastbcast", VNIC_STAT(tx_grp.mcastbcast)}, 83 - {"tx_untagged", VNIC_STAT(tx_grp.untagged)}, 84 - {"tx_vlan", VNIC_STAT(tx_grp.vlan)}, 85 - 86 - {"tx_64_size", VNIC_STAT(tx_grp.s_64)}, 87 - {"tx_65_127", VNIC_STAT(tx_grp.s_65_127)}, 88 - {"tx_128_255", VNIC_STAT(tx_grp.s_128_255)}, 89 - {"tx_256_511", VNIC_STAT(tx_grp.s_256_511)}, 90 - {"tx_512_1023", VNIC_STAT(tx_grp.s_512_1023)}, 91 - {"tx_1024_1518", VNIC_STAT(tx_grp.s_1024_1518)}, 92 - {"tx_1519_max", VNIC_STAT(tx_grp.s_1519_max)}, 93 - 94 - {"rx_unicast", VNIC_STAT(rx_grp.unicast)}, 95 - {"rx_mcastbcast", VNIC_STAT(rx_grp.mcastbcast)}, 96 - {"rx_untagged", VNIC_STAT(rx_grp.untagged)}, 97 - {"rx_vlan", VNIC_STAT(rx_grp.vlan)}, 98 - 99 - {"rx_64_size", VNIC_STAT(rx_grp.s_64)}, 100 - {"rx_65_127", VNIC_STAT(rx_grp.s_65_127)}, 101 - {"rx_128_255", VNIC_STAT(rx_grp.s_128_255)}, 102 - {"rx_256_511", VNIC_STAT(rx_grp.s_256_511)}, 103 - {"rx_512_1023", VNIC_STAT(rx_grp.s_512_1023)}, 104 - {"rx_1024_1518", VNIC_STAT(rx_grp.s_1024_1518)}, 105 - {"rx_1519_max", VNIC_STAT(rx_grp.s_1519_max)}, 106 - 107 - /* ERROR counters */ 108 - {"rx_fifo_errors", VNIC_STAT(netstats.rx_fifo_errors)}, 109 - {"rx_length_errors", VNIC_STAT(netstats.rx_length_errors)}, 110 - 111 - {"tx_fifo_errors", VNIC_STAT(netstats.tx_fifo_errors)}, 112 - {"tx_carrier_errors", VNIC_STAT(netstats.tx_carrier_errors)}, 113 - 114 - {"tx_dlid_zero", VNIC_STAT(tx_dlid_zero)}, 115 - {"tx_drop_state", VNIC_STAT(tx_drop_state)}, 116 - {"rx_drop_state", VNIC_STAT(rx_drop_state)}, 117 - {"rx_oversize", VNIC_STAT(rx_oversize)}, 118 - {"rx_runt", VNIC_STAT(rx_runt)}, 119 - }; 120 - 121 - #define VNIC_STATS_LEN ARRAY_SIZE(vnic_gstrings_stats) 122 - 123 - /* vnic_get_drvinfo - get driver info */ 124 - static void vnic_get_drvinfo(struct net_device *netdev, 125 - struct ethtool_drvinfo *drvinfo) 126 - { 127 - strscpy(drvinfo->driver, opa_vnic_driver_name, sizeof(drvinfo->driver)); 128 - strscpy(drvinfo->bus_info, dev_name(netdev->dev.parent), 129 - sizeof(drvinfo->bus_info)); 130 - } 131 - 132 - /* vnic_get_sset_count - get string set count */ 133 - static int vnic_get_sset_count(struct net_device *netdev, int sset) 134 - { 135 - return (sset == ETH_SS_STATS) ? VNIC_STATS_LEN : -EOPNOTSUPP; 136 - } 137 - 138 - /* vnic_get_ethtool_stats - get statistics */ 139 - static void vnic_get_ethtool_stats(struct net_device *netdev, 140 - struct ethtool_stats *stats, u64 *data) 141 - { 142 - struct opa_vnic_adapter *adapter = opa_vnic_priv(netdev); 143 - struct opa_vnic_stats vstats; 144 - int i; 145 - 146 - memset(&vstats, 0, sizeof(vstats)); 147 - spin_lock(&adapter->stats_lock); 148 - adapter->rn_ops->ndo_get_stats64(netdev, &vstats.netstats); 149 - spin_unlock(&adapter->stats_lock); 150 - for (i = 0; i < VNIC_STATS_LEN; i++) { 151 - char *p = (char *)&vstats + vnic_gstrings_stats[i].stat_offset; 152 - 153 - data[i] = (vnic_gstrings_stats[i].sizeof_stat == 154 - sizeof(u64)) ? *(u64 *)p : *(u32 *)p; 155 - } 156 - } 157 - 158 - /* vnic_get_strings - get strings */ 159 - static void vnic_get_strings(struct net_device *netdev, u32 stringset, u8 *data) 160 - { 161 - int i; 162 - 163 - if (stringset != ETH_SS_STATS) 164 - return; 165 - 166 - for (i = 0; i < VNIC_STATS_LEN; i++) 167 - ethtool_puts(&data, vnic_gstrings_stats[i].stat_string); 168 - } 169 - 170 - /* ethtool ops */ 171 - static const struct ethtool_ops opa_vnic_ethtool_ops = { 172 - .get_drvinfo = vnic_get_drvinfo, 173 - .get_link = ethtool_op_get_link, 174 - .get_strings = vnic_get_strings, 175 - .get_sset_count = vnic_get_sset_count, 176 - .get_ethtool_stats = vnic_get_ethtool_stats, 177 - }; 178 - 179 - /* opa_vnic_set_ethtool_ops - set ethtool ops */ 180 - void opa_vnic_set_ethtool_ops(struct net_device *netdev) 181 - { 182 - netdev->ethtool_ops = &opa_vnic_ethtool_ops; 183 - }

-329

drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h

··· 1 - #ifndef _OPA_VNIC_INTERNAL_H 2 - #define _OPA_VNIC_INTERNAL_H 3 - /* 4 - * Copyright(c) 2017 Intel Corporation. 5 - * 6 - * This file is provided under a dual BSD/GPLv2 license. When using or 7 - * redistributing this file, you may do so under either license. 8 - * 9 - * GPL LICENSE SUMMARY 10 - * 11 - * This program is free software; you can redistribute it and/or modify 12 - * it under the terms of version 2 of the GNU General Public License as 13 - * published by the Free Software Foundation. 14 - * 15 - * This program is distributed in the hope that it will be useful, but 16 - * WITHOUT ANY WARRANTY; without even the implied warranty of 17 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 18 - * General Public License for more details. 19 - * 20 - * BSD LICENSE 21 - * 22 - * Redistribution and use in source and binary forms, with or without 23 - * modification, are permitted provided that the following conditions 24 - * are met: 25 - * 26 - * - Redistributions of source code must retain the above copyright 27 - * notice, this list of conditions and the following disclaimer. 28 - * - Redistributions in binary form must reproduce the above copyright 29 - * notice, this list of conditions and the following disclaimer in 30 - * the documentation and/or other materials provided with the 31 - * distribution. 32 - * - Neither the name of Intel Corporation nor the names of its 33 - * contributors may be used to endorse or promote products derived 34 - * from this software without specific prior written permission. 35 - * 36 - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 37 - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 38 - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 39 - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 40 - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 41 - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 42 - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 43 - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 44 - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 45 - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 46 - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 47 - * 48 - */ 49 - 50 - /* 51 - * This file contains OPA VNIC driver internal declarations 52 - */ 53 - 54 - #include <linux/bitops.h> 55 - #include <linux/etherdevice.h> 56 - #include <linux/hashtable.h> 57 - #include <linux/sizes.h> 58 - #include <rdma/opa_vnic.h> 59 - 60 - #include "opa_vnic_encap.h" 61 - 62 - #define OPA_VNIC_VLAN_PCP(vlan_tci) \ 63 - (((vlan_tci) & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT) 64 - 65 - /* Flow to default port redirection table size */ 66 - #define OPA_VNIC_FLOW_TBL_SIZE 32 67 - 68 - /* Invalid port number */ 69 - #define OPA_VNIC_INVALID_PORT 0xff 70 - 71 - struct opa_vnic_adapter; 72 - 73 - /* 74 - * struct __opa_vesw_info - OPA vnic virtual switch info 75 - * 76 - * Same as opa_vesw_info without bitwise attribute. 77 - */ 78 - struct __opa_vesw_info { 79 - u16 fabric_id; 80 - u16 vesw_id; 81 - 82 - u8 rsvd0[6]; 83 - u16 def_port_mask; 84 - 85 - u8 rsvd1[2]; 86 - u16 pkey; 87 - 88 - u8 rsvd2[4]; 89 - u32 u_mcast_dlid; 90 - u32 u_ucast_dlid[OPA_VESW_MAX_NUM_DEF_PORT]; 91 - 92 - u32 rc; 93 - 94 - u8 rsvd3[56]; 95 - u16 eth_mtu; 96 - u8 rsvd4[2]; 97 - } __packed; 98 - 99 - /* 100 - * struct __opa_per_veswport_info - OPA vnic per port info 101 - * 102 - * Same as opa_per_veswport_info without bitwise attribute. 103 - */ 104 - struct __opa_per_veswport_info { 105 - u32 port_num; 106 - 107 - u8 eth_link_status; 108 - u8 rsvd0[3]; 109 - 110 - u8 base_mac_addr[ETH_ALEN]; 111 - u8 config_state; 112 - u8 oper_state; 113 - 114 - u16 max_mac_tbl_ent; 115 - u16 max_smac_ent; 116 - u32 mac_tbl_digest; 117 - u8 rsvd1[4]; 118 - 119 - u32 encap_slid; 120 - 121 - u8 pcp_to_sc_uc[OPA_VNIC_MAX_NUM_PCP]; 122 - u8 pcp_to_vl_uc[OPA_VNIC_MAX_NUM_PCP]; 123 - u8 pcp_to_sc_mc[OPA_VNIC_MAX_NUM_PCP]; 124 - u8 pcp_to_vl_mc[OPA_VNIC_MAX_NUM_PCP]; 125 - 126 - u8 non_vlan_sc_uc; 127 - u8 non_vlan_vl_uc; 128 - u8 non_vlan_sc_mc; 129 - u8 non_vlan_vl_mc; 130 - 131 - u8 rsvd2[48]; 132 - 133 - u16 uc_macs_gen_count; 134 - u16 mc_macs_gen_count; 135 - 136 - u8 rsvd3[8]; 137 - } __packed; 138 - 139 - /* 140 - * struct __opa_veswport_info - OPA vnic port info 141 - * 142 - * Same as opa_veswport_info without bitwise attribute. 143 - */ 144 - struct __opa_veswport_info { 145 - struct __opa_vesw_info vesw; 146 - struct __opa_per_veswport_info vport; 147 - }; 148 - 149 - /* 150 - * struct __opa_veswport_trap - OPA vnic trap info 151 - * 152 - * Same as opa_veswport_trap without bitwise attribute. 153 - */ 154 - struct __opa_veswport_trap { 155 - u16 fabric_id; 156 - u16 veswid; 157 - u32 veswportnum; 158 - u16 opaportnum; 159 - u8 veswportindex; 160 - u8 opcode; 161 - u32 reserved; 162 - } __packed; 163 - 164 - /** 165 - * struct opa_vnic_ctrl_port - OPA virtual NIC control port 166 - * @ibdev: pointer to ib device 167 - * @ops: opa vnic control operations 168 - * @num_ports: number of opa ports 169 - */ 170 - struct opa_vnic_ctrl_port { 171 - struct ib_device *ibdev; 172 - struct opa_vnic_ctrl_ops *ops; 173 - u8 num_ports; 174 - }; 175 - 176 - /** 177 - * struct opa_vnic_adapter - OPA VNIC netdev private data structure 178 - * @netdev: pointer to associated netdev 179 - * @ibdev: ib device 180 - * @cport: pointer to opa vnic control port 181 - * @rn_ops: rdma netdev's net_device_ops 182 - * @port_num: OPA port number 183 - * @vport_num: vesw port number 184 - * @lock: adapter lock 185 - * @info: virtual ethernet switch port information 186 - * @vema_mac_addr: mac address configured by vema 187 - * @umac_hash: unicast maclist hash 188 - * @mmac_hash: multicast maclist hash 189 - * @mactbl: hash table of MAC entries 190 - * @mactbl_lock: mac table lock 191 - * @stats_lock: statistics lock 192 - * @flow_tbl: flow to default port redirection table 193 - * @trap_timeout: trap timeout 194 - * @trap_count: no. of traps allowed within timeout period 195 - */ 196 - struct opa_vnic_adapter { 197 - struct net_device *netdev; 198 - struct ib_device *ibdev; 199 - struct opa_vnic_ctrl_port *cport; 200 - const struct net_device_ops *rn_ops; 201 - 202 - u8 port_num; 203 - u8 vport_num; 204 - 205 - /* Lock used around concurrent updates to netdev */ 206 - struct mutex lock; 207 - 208 - struct __opa_veswport_info info; 209 - u8 vema_mac_addr[ETH_ALEN]; 210 - u32 umac_hash; 211 - u32 mmac_hash; 212 - struct hlist_head __rcu *mactbl; 213 - 214 - /* Lock used to protect updates to mac table */ 215 - struct mutex mactbl_lock; 216 - 217 - /* Lock used to protect access to vnic counters */ 218 - spinlock_t stats_lock; 219 - 220 - u8 flow_tbl[OPA_VNIC_FLOW_TBL_SIZE]; 221 - 222 - unsigned long trap_timeout; 223 - u8 trap_count; 224 - }; 225 - 226 - /* Same as opa_veswport_mactable_entry, but without bitwise attribute */ 227 - struct __opa_vnic_mactable_entry { 228 - u8 mac_addr[ETH_ALEN]; 229 - u8 mac_addr_mask[ETH_ALEN]; 230 - u32 dlid_sd; 231 - } __packed; 232 - 233 - /** 234 - * struct opa_vnic_mac_tbl_node - OPA VNIC mac table node 235 - * @hlist: hash list handle 236 - * @index: index of entry in the mac table 237 - * @entry: entry in the table 238 - */ 239 - struct opa_vnic_mac_tbl_node { 240 - struct hlist_node hlist; 241 - u16 index; 242 - struct __opa_vnic_mactable_entry entry; 243 - }; 244 - 245 - #define v_dbg(format, arg...) \ 246 - netdev_dbg(adapter->netdev, format, ## arg) 247 - #define v_err(format, arg...) \ 248 - netdev_err(adapter->netdev, format, ## arg) 249 - #define v_info(format, arg...) \ 250 - netdev_info(adapter->netdev, format, ## arg) 251 - #define v_warn(format, arg...) \ 252 - netdev_warn(adapter->netdev, format, ## arg) 253 - 254 - #define c_err(format, arg...) \ 255 - dev_err(&cport->ibdev->dev, format, ## arg) 256 - #define c_info(format, arg...) \ 257 - dev_info(&cport->ibdev->dev, format, ## arg) 258 - #define c_dbg(format, arg...) \ 259 - dev_dbg(&cport->ibdev->dev, format, ## arg) 260 - 261 - /* The maximum allowed entries in the mac table */ 262 - #define OPA_VNIC_MAC_TBL_MAX_ENTRIES 2048 263 - /* Limit of smac entries in mac table */ 264 - #define OPA_VNIC_MAX_SMAC_LIMIT 256 265 - 266 - /* The last octet of the MAC address is used as the key to the hash table */ 267 - #define OPA_VNIC_MAC_HASH_IDX 5 268 - 269 - /* The VNIC MAC hash table is of size 2^8 */ 270 - #define OPA_VNIC_MAC_TBL_HASH_BITS 8 271 - #define OPA_VNIC_MAC_TBL_SIZE BIT(OPA_VNIC_MAC_TBL_HASH_BITS) 272 - 273 - /* VNIC HASH MACROS */ 274 - #define vnic_hash_init(hashtable) __hash_init(hashtable, OPA_VNIC_MAC_TBL_SIZE) 275 - 276 - #define vnic_hash_add(hashtable, node, key) \ 277 - hlist_add_head(node, \ 278 - &hashtable[hash_min(key, ilog2(OPA_VNIC_MAC_TBL_SIZE))]) 279 - 280 - #define vnic_hash_for_each_safe(name, bkt, tmp, obj, member) \ 281 - for ((bkt) = 0, obj = NULL; \ 282 - !obj && (bkt) < OPA_VNIC_MAC_TBL_SIZE; (bkt)++) \ 283 - hlist_for_each_entry_safe(obj, tmp, &name[bkt], member) 284 - 285 - #define vnic_hash_for_each_possible(name, obj, member, key) \ 286 - hlist_for_each_entry(obj, \ 287 - &name[hash_min(key, ilog2(OPA_VNIC_MAC_TBL_SIZE))], member) 288 - 289 - #define vnic_hash_for_each(name, bkt, obj, member) \ 290 - for ((bkt) = 0, obj = NULL; \ 291 - !obj && (bkt) < OPA_VNIC_MAC_TBL_SIZE; (bkt)++) \ 292 - hlist_for_each_entry(obj, &name[bkt], member) 293 - 294 - extern char opa_vnic_driver_name[]; 295 - 296 - struct opa_vnic_adapter *opa_vnic_add_netdev(struct ib_device *ibdev, 297 - u8 port_num, u8 vport_num); 298 - void opa_vnic_rem_netdev(struct opa_vnic_adapter *adapter); 299 - void opa_vnic_encap_skb(struct opa_vnic_adapter *adapter, struct sk_buff *skb); 300 - u8 opa_vnic_get_vl(struct opa_vnic_adapter *adapter, struct sk_buff *skb); 301 - u8 opa_vnic_calc_entropy(struct sk_buff *skb); 302 - void opa_vnic_process_vema_config(struct opa_vnic_adapter *adapter); 303 - void opa_vnic_release_mac_tbl(struct opa_vnic_adapter *adapter); 304 - void opa_vnic_query_mac_tbl(struct opa_vnic_adapter *adapter, 305 - struct opa_veswport_mactable *tbl); 306 - int opa_vnic_update_mac_tbl(struct opa_vnic_adapter *adapter, 307 - struct opa_veswport_mactable *tbl); 308 - void opa_vnic_query_ucast_macs(struct opa_vnic_adapter *adapter, 309 - struct opa_veswport_iface_macs *macs); 310 - void opa_vnic_query_mcast_macs(struct opa_vnic_adapter *adapter, 311 - struct opa_veswport_iface_macs *macs); 312 - void opa_vnic_get_summary_counters(struct opa_vnic_adapter *adapter, 313 - struct opa_veswport_summary_counters *cntrs); 314 - void opa_vnic_get_error_counters(struct opa_vnic_adapter *adapter, 315 - struct opa_veswport_error_counters *cntrs); 316 - void opa_vnic_get_vesw_info(struct opa_vnic_adapter *adapter, 317 - struct opa_vesw_info *info); 318 - void opa_vnic_set_vesw_info(struct opa_vnic_adapter *adapter, 319 - struct opa_vesw_info *info); 320 - void opa_vnic_get_per_veswport_info(struct opa_vnic_adapter *adapter, 321 - struct opa_per_veswport_info *info); 322 - void opa_vnic_set_per_veswport_info(struct opa_vnic_adapter *adapter, 323 - struct opa_per_veswport_info *info); 324 - void opa_vnic_vema_report_event(struct opa_vnic_adapter *adapter, u8 event); 325 - void opa_vnic_set_ethtool_ops(struct net_device *netdev); 326 - void opa_vnic_vema_send_trap(struct opa_vnic_adapter *adapter, 327 - struct __opa_veswport_trap *data, u32 lid); 328 - 329 - #endif /* _OPA_VNIC_INTERNAL_H */

-400

drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c

··· 1 - /* 2 - * Copyright(c) 2017 Intel Corporation. 3 - * 4 - * This file is provided under a dual BSD/GPLv2 license. When using or 5 - * redistributing this file, you may do so under either license. 6 - * 7 - * GPL LICENSE SUMMARY 8 - * 9 - * This program is free software; you can redistribute it and/or modify 10 - * it under the terms of version 2 of the GNU General Public License as 11 - * published by the Free Software Foundation. 12 - * 13 - * This program is distributed in the hope that it will be useful, but 14 - * WITHOUT ANY WARRANTY; without even the implied warranty of 15 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 16 - * General Public License for more details. 17 - * 18 - * BSD LICENSE 19 - * 20 - * Redistribution and use in source and binary forms, with or without 21 - * modification, are permitted provided that the following conditions 22 - * are met: 23 - * 24 - * - Redistributions of source code must retain the above copyright 25 - * notice, this list of conditions and the following disclaimer. 26 - * - Redistributions in binary form must reproduce the above copyright 27 - * notice, this list of conditions and the following disclaimer in 28 - * the documentation and/or other materials provided with the 29 - * distribution. 30 - * - Neither the name of Intel Corporation nor the names of its 31 - * contributors may be used to endorse or promote products derived 32 - * from this software without specific prior written permission. 33 - * 34 - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 35 - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 36 - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 37 - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 38 - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 39 - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 40 - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 41 - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 42 - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 43 - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 44 - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 45 - * 46 - */ 47 - 48 - /* 49 - * This file contains OPA Virtual Network Interface Controller (VNIC) driver 50 - * netdev functionality. 51 - */ 52 - 53 - #include <linux/if_vlan.h> 54 - #include <linux/crc32.h> 55 - 56 - #include "opa_vnic_internal.h" 57 - 58 - #define OPA_TX_TIMEOUT_MS 1000 59 - 60 - #define OPA_VNIC_SKB_HEADROOM \ 61 - ALIGN((OPA_VNIC_HDR_LEN + OPA_VNIC_SKB_MDATA_LEN), 8) 62 - 63 - /* This function is overloaded for opa_vnic specific implementation */ 64 - static void opa_vnic_get_stats64(struct net_device *netdev, 65 - struct rtnl_link_stats64 *stats) 66 - { 67 - struct opa_vnic_adapter *adapter = opa_vnic_priv(netdev); 68 - struct opa_vnic_stats vstats; 69 - 70 - memset(&vstats, 0, sizeof(vstats)); 71 - spin_lock(&adapter->stats_lock); 72 - adapter->rn_ops->ndo_get_stats64(netdev, &vstats.netstats); 73 - spin_unlock(&adapter->stats_lock); 74 - memcpy(stats, &vstats.netstats, sizeof(*stats)); 75 - } 76 - 77 - /* opa_netdev_start_xmit - transmit function */ 78 - static netdev_tx_t opa_netdev_start_xmit(struct sk_buff *skb, 79 - struct net_device *netdev) 80 - { 81 - struct opa_vnic_adapter *adapter = opa_vnic_priv(netdev); 82 - 83 - v_dbg("xmit: queue %d skb len %d\n", skb->queue_mapping, skb->len); 84 - /* pad to ensure mininum ethernet packet length */ 85 - if (unlikely(skb->len < ETH_ZLEN)) { 86 - if (skb_padto(skb, ETH_ZLEN)) 87 - return NETDEV_TX_OK; 88 - 89 - skb_put(skb, ETH_ZLEN - skb->len); 90 - } 91 - 92 - opa_vnic_encap_skb(adapter, skb); 93 - return adapter->rn_ops->ndo_start_xmit(skb, netdev); 94 - } 95 - 96 - static u16 opa_vnic_select_queue(struct net_device *netdev, struct sk_buff *skb, 97 - struct net_device *sb_dev) 98 - { 99 - struct opa_vnic_adapter *adapter = opa_vnic_priv(netdev); 100 - struct opa_vnic_skb_mdata *mdata; 101 - int rc; 102 - 103 - /* pass entropy and vl as metadata in skb */ 104 - mdata = skb_push(skb, sizeof(*mdata)); 105 - mdata->entropy = opa_vnic_calc_entropy(skb); 106 - mdata->vl = opa_vnic_get_vl(adapter, skb); 107 - rc = adapter->rn_ops->ndo_select_queue(netdev, skb, sb_dev); 108 - skb_pull(skb, sizeof(*mdata)); 109 - return rc; 110 - } 111 - 112 - static void opa_vnic_update_state(struct opa_vnic_adapter *adapter, bool up) 113 - { 114 - struct __opa_veswport_info *info = &adapter->info; 115 - 116 - mutex_lock(&adapter->lock); 117 - /* Operational state can only be DROP_ALL or FORWARDING */ 118 - if ((info->vport.config_state == OPA_VNIC_STATE_FORWARDING) && up) { 119 - info->vport.oper_state = OPA_VNIC_STATE_FORWARDING; 120 - info->vport.eth_link_status = OPA_VNIC_ETH_LINK_UP; 121 - } else { 122 - info->vport.oper_state = OPA_VNIC_STATE_DROP_ALL; 123 - info->vport.eth_link_status = OPA_VNIC_ETH_LINK_DOWN; 124 - } 125 - 126 - if (info->vport.config_state == OPA_VNIC_STATE_FORWARDING) 127 - netif_dormant_off(adapter->netdev); 128 - else 129 - netif_dormant_on(adapter->netdev); 130 - mutex_unlock(&adapter->lock); 131 - } 132 - 133 - /* opa_vnic_process_vema_config - process vema configuration updates */ 134 - void opa_vnic_process_vema_config(struct opa_vnic_adapter *adapter) 135 - { 136 - struct __opa_veswport_info *info = &adapter->info; 137 - struct rdma_netdev *rn = netdev_priv(adapter->netdev); 138 - u8 port_num[OPA_VESW_MAX_NUM_DEF_PORT] = { 0 }; 139 - struct net_device *netdev = adapter->netdev; 140 - u8 i, port_count = 0; 141 - u16 port_mask; 142 - 143 - /* If the base_mac_addr is changed, update the interface mac address */ 144 - if (memcmp(info->vport.base_mac_addr, adapter->vema_mac_addr, 145 - ARRAY_SIZE(info->vport.base_mac_addr))) { 146 - struct sockaddr saddr; 147 - 148 - memcpy(saddr.sa_data, info->vport.base_mac_addr, 149 - ARRAY_SIZE(info->vport.base_mac_addr)); 150 - mutex_lock(&adapter->lock); 151 - eth_commit_mac_addr_change(netdev, &saddr); 152 - memcpy(adapter->vema_mac_addr, 153 - info->vport.base_mac_addr, ETH_ALEN); 154 - mutex_unlock(&adapter->lock); 155 - } 156 - 157 - rn->set_id(netdev, info->vesw.vesw_id); 158 - 159 - /* Handle MTU limit change */ 160 - rtnl_lock(); 161 - netdev->max_mtu = max_t(unsigned int, info->vesw.eth_mtu, 162 - netdev->min_mtu); 163 - if (netdev->mtu > netdev->max_mtu) 164 - dev_set_mtu(netdev, netdev->max_mtu); 165 - rtnl_unlock(); 166 - 167 - /* Update flow to default port redirection table */ 168 - port_mask = info->vesw.def_port_mask; 169 - for (i = 0; i < OPA_VESW_MAX_NUM_DEF_PORT; i++) { 170 - if (port_mask & 1) 171 - port_num[port_count++] = i; 172 - port_mask >>= 1; 173 - } 174 - 175 - /* 176 - * Build the flow table. Flow table is required when destination LID 177 - * is not available. Up to OPA_VNIC_FLOW_TBL_SIZE flows supported. 178 - * Each flow need a default port number to get its dlid from the 179 - * u_ucast_dlid array. 180 - */ 181 - for (i = 0; i < OPA_VNIC_FLOW_TBL_SIZE; i++) 182 - adapter->flow_tbl[i] = port_count ? port_num[i % port_count] : 183 - OPA_VNIC_INVALID_PORT; 184 - 185 - /* update state */ 186 - opa_vnic_update_state(adapter, !!(netdev->flags & IFF_UP)); 187 - } 188 - 189 - /* 190 - * Set the power on default values in adapter's vema interface structure. 191 - */ 192 - static inline void opa_vnic_set_pod_values(struct opa_vnic_adapter *adapter) 193 - { 194 - adapter->info.vport.max_mac_tbl_ent = OPA_VNIC_MAC_TBL_MAX_ENTRIES; 195 - adapter->info.vport.max_smac_ent = OPA_VNIC_MAX_SMAC_LIMIT; 196 - adapter->info.vport.config_state = OPA_VNIC_STATE_DROP_ALL; 197 - adapter->info.vport.eth_link_status = OPA_VNIC_ETH_LINK_DOWN; 198 - adapter->info.vesw.eth_mtu = ETH_DATA_LEN; 199 - } 200 - 201 - /* opa_vnic_set_mac_addr - change mac address */ 202 - static int opa_vnic_set_mac_addr(struct net_device *netdev, void *addr) 203 - { 204 - struct opa_vnic_adapter *adapter = opa_vnic_priv(netdev); 205 - struct sockaddr *sa = addr; 206 - int rc; 207 - 208 - if (!memcmp(netdev->dev_addr, sa->sa_data, ETH_ALEN)) 209 - return 0; 210 - 211 - mutex_lock(&adapter->lock); 212 - rc = eth_mac_addr(netdev, addr); 213 - mutex_unlock(&adapter->lock); 214 - if (rc) 215 - return rc; 216 - 217 - adapter->info.vport.uc_macs_gen_count++; 218 - opa_vnic_vema_report_event(adapter, 219 - OPA_VESWPORT_TRAP_IFACE_UCAST_MAC_CHANGE); 220 - return 0; 221 - } 222 - 223 - /* 224 - * opa_vnic_mac_send_event - post event on possible mac list exchange 225 - * Send trap when digest from uc/mc mac list differs from previous run. 226 - * Digest is evaluated similar to how cksum does. 227 - */ 228 - static void opa_vnic_mac_send_event(struct net_device *netdev, u8 event) 229 - { 230 - struct opa_vnic_adapter *adapter = opa_vnic_priv(netdev); 231 - struct netdev_hw_addr *ha; 232 - struct netdev_hw_addr_list *hw_list; 233 - u32 *ref_crc; 234 - u32 l, crc = 0; 235 - 236 - switch (event) { 237 - case OPA_VESWPORT_TRAP_IFACE_UCAST_MAC_CHANGE: 238 - hw_list = &netdev->uc; 239 - adapter->info.vport.uc_macs_gen_count++; 240 - ref_crc = &adapter->umac_hash; 241 - break; 242 - case OPA_VESWPORT_TRAP_IFACE_MCAST_MAC_CHANGE: 243 - hw_list = &netdev->mc; 244 - adapter->info.vport.mc_macs_gen_count++; 245 - ref_crc = &adapter->mmac_hash; 246 - break; 247 - default: 248 - return; 249 - } 250 - netdev_hw_addr_list_for_each(ha, hw_list) { 251 - crc = crc32_le(crc, ha->addr, ETH_ALEN); 252 - } 253 - l = netdev_hw_addr_list_count(hw_list) * ETH_ALEN; 254 - crc = ~crc32_le(crc, (void *)&l, sizeof(l)); 255 - 256 - if (crc != *ref_crc) { 257 - *ref_crc = crc; 258 - opa_vnic_vema_report_event(adapter, event); 259 - } 260 - } 261 - 262 - /* opa_vnic_set_rx_mode - handle uc/mc mac list change */ 263 - static void opa_vnic_set_rx_mode(struct net_device *netdev) 264 - { 265 - opa_vnic_mac_send_event(netdev, 266 - OPA_VESWPORT_TRAP_IFACE_UCAST_MAC_CHANGE); 267 - 268 - opa_vnic_mac_send_event(netdev, 269 - OPA_VESWPORT_TRAP_IFACE_MCAST_MAC_CHANGE); 270 - } 271 - 272 - /* opa_netdev_open - activate network interface */ 273 - static int opa_netdev_open(struct net_device *netdev) 274 - { 275 - struct opa_vnic_adapter *adapter = opa_vnic_priv(netdev); 276 - int rc; 277 - 278 - rc = adapter->rn_ops->ndo_open(adapter->netdev); 279 - if (rc) { 280 - v_dbg("open failed %d\n", rc); 281 - return rc; 282 - } 283 - 284 - /* Update status and send trap */ 285 - opa_vnic_update_state(adapter, true); 286 - opa_vnic_vema_report_event(adapter, 287 - OPA_VESWPORT_TRAP_ETH_LINK_STATUS_CHANGE); 288 - return 0; 289 - } 290 - 291 - /* opa_netdev_close - disable network interface */ 292 - static int opa_netdev_close(struct net_device *netdev) 293 - { 294 - struct opa_vnic_adapter *adapter = opa_vnic_priv(netdev); 295 - int rc; 296 - 297 - rc = adapter->rn_ops->ndo_stop(adapter->netdev); 298 - if (rc) { 299 - v_dbg("close failed %d\n", rc); 300 - return rc; 301 - } 302 - 303 - /* Update status and send trap */ 304 - opa_vnic_update_state(adapter, false); 305 - opa_vnic_vema_report_event(adapter, 306 - OPA_VESWPORT_TRAP_ETH_LINK_STATUS_CHANGE); 307 - return 0; 308 - } 309 - 310 - /* netdev ops */ 311 - static const struct net_device_ops opa_netdev_ops = { 312 - .ndo_open = opa_netdev_open, 313 - .ndo_stop = opa_netdev_close, 314 - .ndo_start_xmit = opa_netdev_start_xmit, 315 - .ndo_get_stats64 = opa_vnic_get_stats64, 316 - .ndo_set_rx_mode = opa_vnic_set_rx_mode, 317 - .ndo_select_queue = opa_vnic_select_queue, 318 - .ndo_set_mac_address = opa_vnic_set_mac_addr, 319 - }; 320 - 321 - /* opa_vnic_add_netdev - create vnic netdev interface */ 322 - struct opa_vnic_adapter *opa_vnic_add_netdev(struct ib_device *ibdev, 323 - u8 port_num, u8 vport_num) 324 - { 325 - struct opa_vnic_adapter *adapter; 326 - struct net_device *netdev; 327 - struct rdma_netdev *rn; 328 - int rc; 329 - 330 - netdev = ibdev->ops.alloc_rdma_netdev(ibdev, port_num, 331 - RDMA_NETDEV_OPA_VNIC, 332 - "veth%d", NET_NAME_UNKNOWN, 333 - ether_setup); 334 - if (!netdev) 335 - return ERR_PTR(-ENOMEM); 336 - else if (IS_ERR(netdev)) 337 - return ERR_CAST(netdev); 338 - 339 - rn = netdev_priv(netdev); 340 - adapter = kzalloc_obj(*adapter); 341 - if (!adapter) { 342 - rc = -ENOMEM; 343 - goto adapter_err; 344 - } 345 - 346 - rn->clnt_priv = adapter; 347 - rn->hca = ibdev; 348 - rn->port_num = port_num; 349 - adapter->netdev = netdev; 350 - adapter->ibdev = ibdev; 351 - adapter->port_num = port_num; 352 - adapter->vport_num = vport_num; 353 - adapter->rn_ops = netdev->netdev_ops; 354 - 355 - netdev->netdev_ops = &opa_netdev_ops; 356 - netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE; 357 - netdev->hard_header_len += OPA_VNIC_SKB_HEADROOM; 358 - mutex_init(&adapter->lock); 359 - mutex_init(&adapter->mactbl_lock); 360 - spin_lock_init(&adapter->stats_lock); 361 - 362 - SET_NETDEV_DEV(netdev, ibdev->dev.parent); 363 - 364 - opa_vnic_set_ethtool_ops(netdev); 365 - 366 - opa_vnic_set_pod_values(adapter); 367 - 368 - rc = register_netdev(netdev); 369 - if (rc) 370 - goto netdev_err; 371 - 372 - netif_carrier_off(netdev); 373 - netif_dormant_on(netdev); 374 - v_info("initialized\n"); 375 - 376 - return adapter; 377 - netdev_err: 378 - mutex_destroy(&adapter->lock); 379 - mutex_destroy(&adapter->mactbl_lock); 380 - kfree(adapter); 381 - adapter_err: 382 - rn->free_rdma_netdev(netdev); 383 - 384 - return ERR_PTR(rc); 385 - } 386 - 387 - /* opa_vnic_rem_netdev - remove vnic netdev interface */ 388 - void opa_vnic_rem_netdev(struct opa_vnic_adapter *adapter) 389 - { 390 - struct net_device *netdev = adapter->netdev; 391 - struct rdma_netdev *rn = netdev_priv(netdev); 392 - 393 - v_info("removing\n"); 394 - unregister_netdev(netdev); 395 - opa_vnic_release_mac_tbl(adapter); 396 - mutex_destroy(&adapter->lock); 397 - mutex_destroy(&adapter->mactbl_lock); 398 - kfree(adapter); 399 - rn->free_rdma_netdev(netdev); 400 - }

-1056

drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c

··· 1 - /* 2 - * Copyright(c) 2017 Intel Corporation. 3 - * Copyright(c) 2021 Cornelis Networks. 4 - * 5 - * This file is provided under a dual BSD/GPLv2 license. When using or 6 - * redistributing this file, you may do so under either license. 7 - * 8 - * GPL LICENSE SUMMARY 9 - * 10 - * This program is free software; you can redistribute it and/or modify 11 - * it under the terms of version 2 of the GNU General Public License as 12 - * published by the Free Software Foundation. 13 - * 14 - * This program is distributed in the hope that it will be useful, but 15 - * WITHOUT ANY WARRANTY; without even the implied warranty of 16 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 17 - * General Public License for more details. 18 - * 19 - * BSD LICENSE 20 - * 21 - * Redistribution and use in source and binary forms, with or without 22 - * modification, are permitted provided that the following conditions 23 - * are met: 24 - * 25 - * - Redistributions of source code must retain the above copyright 26 - * notice, this list of conditions and the following disclaimer. 27 - * - Redistributions in binary form must reproduce the above copyright 28 - * notice, this list of conditions and the following disclaimer in 29 - * the documentation and/or other materials provided with the 30 - * distribution. 31 - * - Neither the name of Intel Corporation nor the names of its 32 - * contributors may be used to endorse or promote products derived 33 - * from this software without specific prior written permission. 34 - * 35 - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 36 - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 37 - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 38 - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 39 - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 40 - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 41 - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 42 - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 43 - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 44 - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 45 - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 46 - * 47 - */ 48 - 49 - /* 50 - * This file contains OPX Virtual Network Interface Controller (VNIC) 51 - * Ethernet Management Agent (EMA) driver 52 - */ 53 - 54 - #include <linux/module.h> 55 - #include <linux/xarray.h> 56 - #include <rdma/ib_addr.h> 57 - #include <rdma/ib_verbs.h> 58 - #include <rdma/opa_smi.h> 59 - #include <rdma/opa_port_info.h> 60 - 61 - #include "opa_vnic_internal.h" 62 - 63 - char opa_vnic_driver_name[] = "opa_vnic"; 64 - 65 - /* 66 - * The trap service level is kept in bits 3 to 7 in the trap_sl_rsvd 67 - * field in the class port info MAD. 68 - */ 69 - #define GET_TRAP_SL_FROM_CLASS_PORT_INFO(x) (((x) >> 3) & 0x1f) 70 - 71 - /* Cap trap bursts to a reasonable limit good for normal cases */ 72 - #define OPA_VNIC_TRAP_BURST_LIMIT 4 73 - 74 - /* 75 - * VNIC trap limit timeout. 76 - * Inverse of cap2_mask response time out (1.0737 secs) = 0.9 77 - * secs approx IB spec 13.4.6.2.1 PortInfoSubnetTimeout and 78 - * 13.4.9 Traps. 79 - */ 80 - #define OPA_VNIC_TRAP_TIMEOUT ((4096 * (1UL << 18)) / 1000) 81 - 82 - #define OPA_VNIC_UNSUP_ATTR \ 83 - cpu_to_be16(IB_MGMT_MAD_STATUS_UNSUPPORTED_METHOD_ATTRIB) 84 - 85 - #define OPA_VNIC_INVAL_ATTR \ 86 - cpu_to_be16(IB_MGMT_MAD_STATUS_INVALID_ATTRIB_VALUE) 87 - 88 - #define OPA_VNIC_CLASS_CAP_TRAP 0x1 89 - 90 - /* Maximum number of VNIC ports supported */ 91 - #define OPA_VNIC_MAX_NUM_VPORT 255 92 - 93 - /** 94 - * struct opa_vnic_vema_port -- VNIC VEMA port details 95 - * @cport: pointer to port 96 - * @mad_agent: pointer to mad agent for port 97 - * @class_port_info: Class port info information. 98 - * @tid: Transaction id 99 - * @port_num: OPA port number 100 - * @vports: vnic ports 101 - * @event_handler: ib event handler 102 - * @lock: adapter interface lock 103 - */ 104 - struct opa_vnic_vema_port { 105 - struct opa_vnic_ctrl_port *cport; 106 - struct ib_mad_agent *mad_agent; 107 - struct opa_class_port_info class_port_info; 108 - u64 tid; 109 - u8 port_num; 110 - struct xarray vports; 111 - struct ib_event_handler event_handler; 112 - 113 - /* Lock to query/update network adapter */ 114 - struct mutex lock; 115 - }; 116 - 117 - static int opa_vnic_vema_add_one(struct ib_device *device); 118 - static void opa_vnic_vema_rem_one(struct ib_device *device, 119 - void *client_data); 120 - 121 - static struct ib_client opa_vnic_client = { 122 - .name = opa_vnic_driver_name, 123 - .add = opa_vnic_vema_add_one, 124 - .remove = opa_vnic_vema_rem_one, 125 - }; 126 - 127 - /** 128 - * vema_get_vport_num -- Get the vnic from the mad 129 - * @recvd_mad: Received mad 130 - * 131 - * Return: returns value of the vnic port number 132 - */ 133 - static inline u8 vema_get_vport_num(struct opa_vnic_vema_mad *recvd_mad) 134 - { 135 - return be32_to_cpu(recvd_mad->mad_hdr.attr_mod) & 0xff; 136 - } 137 - 138 - /** 139 - * vema_get_vport_adapter -- Get vnic port adapter from recvd mad 140 - * @recvd_mad: received mad 141 - * @port: ptr to port struct on which MAD was recvd 142 - * 143 - * Return: vnic adapter 144 - */ 145 - static inline struct opa_vnic_adapter * 146 - vema_get_vport_adapter(struct opa_vnic_vema_mad *recvd_mad, 147 - struct opa_vnic_vema_port *port) 148 - { 149 - u8 vport_num = vema_get_vport_num(recvd_mad); 150 - 151 - return xa_load(&port->vports, vport_num); 152 - } 153 - 154 - /** 155 - * vema_mac_tbl_req_ok -- Check if mac request has correct values 156 - * @mac_tbl: mac table 157 - * 158 - * This function checks for the validity of the offset and number of 159 - * entries required. 160 - * 161 - * Return: true if offset and num_entries are valid 162 - */ 163 - static inline bool vema_mac_tbl_req_ok(struct opa_veswport_mactable *mac_tbl) 164 - { 165 - u16 offset, num_entries; 166 - u16 req_entries = ((OPA_VNIC_EMA_DATA - sizeof(*mac_tbl)) / 167 - sizeof(mac_tbl->tbl_entries[0])); 168 - 169 - offset = be16_to_cpu(mac_tbl->offset); 170 - num_entries = be16_to_cpu(mac_tbl->num_entries); 171 - 172 - return ((num_entries <= req_entries) && 173 - (offset + num_entries <= OPA_VNIC_MAC_TBL_MAX_ENTRIES)); 174 - } 175 - 176 - /* 177 - * Return the power on default values in the port info structure 178 - * in big endian format as required by MAD. 179 - */ 180 - static inline void vema_get_pod_values(struct opa_veswport_info *port_info) 181 - { 182 - memset(port_info, 0, sizeof(*port_info)); 183 - port_info->vport.max_mac_tbl_ent = 184 - cpu_to_be16(OPA_VNIC_MAC_TBL_MAX_ENTRIES); 185 - port_info->vport.max_smac_ent = 186 - cpu_to_be16(OPA_VNIC_MAX_SMAC_LIMIT); 187 - port_info->vport.oper_state = OPA_VNIC_STATE_DROP_ALL; 188 - port_info->vport.config_state = OPA_VNIC_STATE_DROP_ALL; 189 - port_info->vesw.eth_mtu = cpu_to_be16(ETH_DATA_LEN); 190 - } 191 - 192 - /** 193 - * vema_add_vport -- Add a new vnic port 194 - * @port: ptr to opa_vnic_vema_port struct 195 - * @vport_num: vnic port number (to be added) 196 - * 197 - * Return a pointer to the vnic adapter structure 198 - */ 199 - static struct opa_vnic_adapter *vema_add_vport(struct opa_vnic_vema_port *port, 200 - u8 vport_num) 201 - { 202 - struct opa_vnic_ctrl_port *cport = port->cport; 203 - struct opa_vnic_adapter *adapter; 204 - 205 - adapter = opa_vnic_add_netdev(cport->ibdev, port->port_num, vport_num); 206 - if (!IS_ERR(adapter)) { 207 - int rc; 208 - 209 - adapter->cport = cport; 210 - rc = xa_insert(&port->vports, vport_num, adapter, GFP_KERNEL); 211 - if (rc < 0) { 212 - opa_vnic_rem_netdev(adapter); 213 - adapter = ERR_PTR(rc); 214 - } 215 - } 216 - 217 - return adapter; 218 - } 219 - 220 - /** 221 - * vema_get_class_port_info -- Get class info for port 222 - * @port: Port on whic MAD was received 223 - * @recvd_mad: pointer to the received mad 224 - * @rsp_mad: pointer to respose mad 225 - * 226 - * This function copies the latest class port info value set for the 227 - * port and stores it for generating traps 228 - */ 229 - static void vema_get_class_port_info(struct opa_vnic_vema_port *port, 230 - struct opa_vnic_vema_mad *recvd_mad, 231 - struct opa_vnic_vema_mad *rsp_mad) 232 - { 233 - struct opa_class_port_info *port_info; 234 - 235 - port_info = (struct opa_class_port_info *)rsp_mad->data; 236 - memcpy(port_info, &port->class_port_info, sizeof(*port_info)); 237 - port_info->base_version = OPA_MGMT_BASE_VERSION; 238 - port_info->class_version = OPA_EMA_CLASS_VERSION; 239 - 240 - /* 241 - * Set capability mask bit indicating agent generates traps, 242 - * and set the maximum number of VNIC ports supported. 243 - */ 244 - port_info->cap_mask = cpu_to_be16((OPA_VNIC_CLASS_CAP_TRAP | 245 - (OPA_VNIC_MAX_NUM_VPORT << 8))); 246 - 247 - /* 248 - * Since a get routine is always sent by the EM first we 249 - * set the expected response time to 250 - * 4.096 usec * 2^18 == 1.0737 sec here. 251 - */ 252 - port_info->cap_mask2_resp_time = cpu_to_be32(18); 253 - } 254 - 255 - /** 256 - * vema_set_class_port_info -- Get class info for port 257 - * @port: Port on whic MAD was received 258 - * @recvd_mad: pointer to the received mad 259 - * @rsp_mad: pointer to respose mad 260 - * 261 - * This function updates the port class info for the specific vnic 262 - * and sets up the response mad data 263 - */ 264 - static void vema_set_class_port_info(struct opa_vnic_vema_port *port, 265 - struct opa_vnic_vema_mad *recvd_mad, 266 - struct opa_vnic_vema_mad *rsp_mad) 267 - { 268 - memcpy(&port->class_port_info, recvd_mad->data, 269 - sizeof(port->class_port_info)); 270 - 271 - vema_get_class_port_info(port, recvd_mad, rsp_mad); 272 - } 273 - 274 - /** 275 - * vema_get_veswport_info -- Get veswport info 276 - * @port: source port on which MAD was received 277 - * @recvd_mad: pointer to the received mad 278 - * @rsp_mad: pointer to respose mad 279 - */ 280 - static void vema_get_veswport_info(struct opa_vnic_vema_port *port, 281 - struct opa_vnic_vema_mad *recvd_mad, 282 - struct opa_vnic_vema_mad *rsp_mad) 283 - { 284 - struct opa_veswport_info *port_info = 285 - (struct opa_veswport_info *)rsp_mad->data; 286 - struct opa_vnic_adapter *adapter; 287 - 288 - adapter = vema_get_vport_adapter(recvd_mad, port); 289 - if (adapter) { 290 - memset(port_info, 0, sizeof(*port_info)); 291 - opa_vnic_get_vesw_info(adapter, &port_info->vesw); 292 - opa_vnic_get_per_veswport_info(adapter, 293 - &port_info->vport); 294 - } else { 295 - vema_get_pod_values(port_info); 296 - } 297 - } 298 - 299 - /** 300 - * vema_set_veswport_info -- Set veswport info 301 - * @port: source port on which MAD was received 302 - * @recvd_mad: pointer to the received mad 303 - * @rsp_mad: pointer to respose mad 304 - * 305 - * This function gets the port class infor for vnic 306 - */ 307 - static void vema_set_veswport_info(struct opa_vnic_vema_port *port, 308 - struct opa_vnic_vema_mad *recvd_mad, 309 - struct opa_vnic_vema_mad *rsp_mad) 310 - { 311 - struct opa_vnic_ctrl_port *cport = port->cport; 312 - struct opa_veswport_info *port_info; 313 - struct opa_vnic_adapter *adapter; 314 - u8 vport_num; 315 - 316 - vport_num = vema_get_vport_num(recvd_mad); 317 - 318 - adapter = vema_get_vport_adapter(recvd_mad, port); 319 - if (!adapter) { 320 - adapter = vema_add_vport(port, vport_num); 321 - if (IS_ERR(adapter)) { 322 - c_err("failed to add vport %d: %ld\n", 323 - vport_num, PTR_ERR(adapter)); 324 - goto err_exit; 325 - } 326 - } 327 - 328 - port_info = (struct opa_veswport_info *)recvd_mad->data; 329 - opa_vnic_set_vesw_info(adapter, &port_info->vesw); 330 - opa_vnic_set_per_veswport_info(adapter, &port_info->vport); 331 - 332 - /* Process the new config settings */ 333 - opa_vnic_process_vema_config(adapter); 334 - 335 - vema_get_veswport_info(port, recvd_mad, rsp_mad); 336 - return; 337 - 338 - err_exit: 339 - rsp_mad->mad_hdr.status = OPA_VNIC_INVAL_ATTR; 340 - } 341 - 342 - /** 343 - * vema_get_mac_entries -- Get MAC entries in VNIC MAC table 344 - * @port: source port on which MAD was received 345 - * @recvd_mad: pointer to the received mad 346 - * @rsp_mad: pointer to respose mad 347 - * 348 - * This function gets the MAC entries that are programmed into 349 - * the VNIC MAC forwarding table. It checks for the validity of 350 - * the index into the MAC table and the number of entries that 351 - * are to be retrieved. 352 - */ 353 - static void vema_get_mac_entries(struct opa_vnic_vema_port *port, 354 - struct opa_vnic_vema_mad *recvd_mad, 355 - struct opa_vnic_vema_mad *rsp_mad) 356 - { 357 - struct opa_veswport_mactable *mac_tbl_in, *mac_tbl_out; 358 - struct opa_vnic_adapter *adapter; 359 - 360 - adapter = vema_get_vport_adapter(recvd_mad, port); 361 - if (!adapter) { 362 - rsp_mad->mad_hdr.status = OPA_VNIC_INVAL_ATTR; 363 - return; 364 - } 365 - 366 - mac_tbl_in = (struct opa_veswport_mactable *)recvd_mad->data; 367 - mac_tbl_out = (struct opa_veswport_mactable *)rsp_mad->data; 368 - 369 - if (vema_mac_tbl_req_ok(mac_tbl_in)) { 370 - mac_tbl_out->offset = mac_tbl_in->offset; 371 - mac_tbl_out->num_entries = mac_tbl_in->num_entries; 372 - opa_vnic_query_mac_tbl(adapter, mac_tbl_out); 373 - } else { 374 - rsp_mad->mad_hdr.status = OPA_VNIC_INVAL_ATTR; 375 - } 376 - } 377 - 378 - /** 379 - * vema_set_mac_entries -- Set MAC entries in VNIC MAC table 380 - * @port: source port on which MAD was received 381 - * @recvd_mad: pointer to the received mad 382 - * @rsp_mad: pointer to respose mad 383 - * 384 - * This function sets the MAC entries in the VNIC forwarding table 385 - * It checks for the validity of the index and the number of forwarding 386 - * table entries to be programmed. 387 - */ 388 - static void vema_set_mac_entries(struct opa_vnic_vema_port *port, 389 - struct opa_vnic_vema_mad *recvd_mad, 390 - struct opa_vnic_vema_mad *rsp_mad) 391 - { 392 - struct opa_veswport_mactable *mac_tbl; 393 - struct opa_vnic_adapter *adapter; 394 - 395 - adapter = vema_get_vport_adapter(recvd_mad, port); 396 - if (!adapter) { 397 - rsp_mad->mad_hdr.status = OPA_VNIC_INVAL_ATTR; 398 - return; 399 - } 400 - 401 - mac_tbl = (struct opa_veswport_mactable *)recvd_mad->data; 402 - if (vema_mac_tbl_req_ok(mac_tbl)) { 403 - if (opa_vnic_update_mac_tbl(adapter, mac_tbl)) 404 - rsp_mad->mad_hdr.status = OPA_VNIC_UNSUP_ATTR; 405 - } else { 406 - rsp_mad->mad_hdr.status = OPA_VNIC_UNSUP_ATTR; 407 - } 408 - vema_get_mac_entries(port, recvd_mad, rsp_mad); 409 - } 410 - 411 - /** 412 - * vema_set_delete_vesw -- Reset VESW info to POD values 413 - * @port: source port on which MAD was received 414 - * @recvd_mad: pointer to the received mad 415 - * @rsp_mad: pointer to respose mad 416 - * 417 - * This function clears all the fields of veswport info for the requested vesw 418 - * and sets them back to the power-on default values. It does not delete the 419 - * vesw. 420 - */ 421 - static void vema_set_delete_vesw(struct opa_vnic_vema_port *port, 422 - struct opa_vnic_vema_mad *recvd_mad, 423 - struct opa_vnic_vema_mad *rsp_mad) 424 - { 425 - struct opa_veswport_info *port_info = 426 - (struct opa_veswport_info *)rsp_mad->data; 427 - struct opa_vnic_adapter *adapter; 428 - 429 - adapter = vema_get_vport_adapter(recvd_mad, port); 430 - if (!adapter) { 431 - rsp_mad->mad_hdr.status = OPA_VNIC_INVAL_ATTR; 432 - return; 433 - } 434 - 435 - vema_get_pod_values(port_info); 436 - opa_vnic_set_vesw_info(adapter, &port_info->vesw); 437 - opa_vnic_set_per_veswport_info(adapter, &port_info->vport); 438 - 439 - /* Process the new config settings */ 440 - opa_vnic_process_vema_config(adapter); 441 - 442 - opa_vnic_release_mac_tbl(adapter); 443 - 444 - vema_get_veswport_info(port, recvd_mad, rsp_mad); 445 - } 446 - 447 - /** 448 - * vema_get_mac_list -- Get the unicast/multicast macs. 449 - * @port: source port on which MAD was received 450 - * @recvd_mad: Received mad contains fields to set vnic parameters 451 - * @rsp_mad: Response mad to be built 452 - * @attr_id: Attribute ID indicating multicast or unicast mac list 453 - */ 454 - static void vema_get_mac_list(struct opa_vnic_vema_port *port, 455 - struct opa_vnic_vema_mad *recvd_mad, 456 - struct opa_vnic_vema_mad *rsp_mad, 457 - u16 attr_id) 458 - { 459 - struct opa_veswport_iface_macs *macs_in, *macs_out; 460 - int max_entries = (OPA_VNIC_EMA_DATA - sizeof(*macs_out)) / ETH_ALEN; 461 - struct opa_vnic_adapter *adapter; 462 - 463 - adapter = vema_get_vport_adapter(recvd_mad, port); 464 - if (!adapter) { 465 - rsp_mad->mad_hdr.status = OPA_VNIC_INVAL_ATTR; 466 - return; 467 - } 468 - 469 - macs_in = (struct opa_veswport_iface_macs *)recvd_mad->data; 470 - macs_out = (struct opa_veswport_iface_macs *)rsp_mad->data; 471 - 472 - macs_out->start_idx = macs_in->start_idx; 473 - if (macs_in->num_macs_in_msg) 474 - macs_out->num_macs_in_msg = macs_in->num_macs_in_msg; 475 - else 476 - macs_out->num_macs_in_msg = cpu_to_be16(max_entries); 477 - 478 - if (attr_id == OPA_EM_ATTR_IFACE_MCAST_MACS) 479 - opa_vnic_query_mcast_macs(adapter, macs_out); 480 - else 481 - opa_vnic_query_ucast_macs(adapter, macs_out); 482 - } 483 - 484 - /** 485 - * vema_get_summary_counters -- Gets summary counters. 486 - * @port: source port on which MAD was received 487 - * @recvd_mad: Received mad contains fields to set vnic parameters 488 - * @rsp_mad: Response mad to be built 489 - */ 490 - static void vema_get_summary_counters(struct opa_vnic_vema_port *port, 491 - struct opa_vnic_vema_mad *recvd_mad, 492 - struct opa_vnic_vema_mad *rsp_mad) 493 - { 494 - struct opa_veswport_summary_counters *cntrs; 495 - struct opa_vnic_adapter *adapter; 496 - 497 - adapter = vema_get_vport_adapter(recvd_mad, port); 498 - if (adapter) { 499 - cntrs = (struct opa_veswport_summary_counters *)rsp_mad->data; 500 - opa_vnic_get_summary_counters(adapter, cntrs); 501 - } else { 502 - rsp_mad->mad_hdr.status = OPA_VNIC_INVAL_ATTR; 503 - } 504 - } 505 - 506 - /** 507 - * vema_get_error_counters -- Gets summary counters. 508 - * @port: source port on which MAD was received 509 - * @recvd_mad: Received mad contains fields to set vnic parameters 510 - * @rsp_mad: Response mad to be built 511 - */ 512 - static void vema_get_error_counters(struct opa_vnic_vema_port *port, 513 - struct opa_vnic_vema_mad *recvd_mad, 514 - struct opa_vnic_vema_mad *rsp_mad) 515 - { 516 - struct opa_veswport_error_counters *cntrs; 517 - struct opa_vnic_adapter *adapter; 518 - 519 - adapter = vema_get_vport_adapter(recvd_mad, port); 520 - if (adapter) { 521 - cntrs = (struct opa_veswport_error_counters *)rsp_mad->data; 522 - opa_vnic_get_error_counters(adapter, cntrs); 523 - } else { 524 - rsp_mad->mad_hdr.status = OPA_VNIC_INVAL_ATTR; 525 - } 526 - } 527 - 528 - /** 529 - * vema_get -- Process received get MAD 530 - * @port: source port on which MAD was received 531 - * @recvd_mad: Received mad 532 - * @rsp_mad: Response mad to be built 533 - */ 534 - static void vema_get(struct opa_vnic_vema_port *port, 535 - struct opa_vnic_vema_mad *recvd_mad, 536 - struct opa_vnic_vema_mad *rsp_mad) 537 - { 538 - u16 attr_id = be16_to_cpu(recvd_mad->mad_hdr.attr_id); 539 - 540 - switch (attr_id) { 541 - case OPA_EM_ATTR_CLASS_PORT_INFO: 542 - vema_get_class_port_info(port, recvd_mad, rsp_mad); 543 - break; 544 - case OPA_EM_ATTR_VESWPORT_INFO: 545 - vema_get_veswport_info(port, recvd_mad, rsp_mad); 546 - break; 547 - case OPA_EM_ATTR_VESWPORT_MAC_ENTRIES: 548 - vema_get_mac_entries(port, recvd_mad, rsp_mad); 549 - break; 550 - case OPA_EM_ATTR_IFACE_UCAST_MACS: 551 - case OPA_EM_ATTR_IFACE_MCAST_MACS: 552 - vema_get_mac_list(port, recvd_mad, rsp_mad, attr_id); 553 - break; 554 - case OPA_EM_ATTR_VESWPORT_SUMMARY_COUNTERS: 555 - vema_get_summary_counters(port, recvd_mad, rsp_mad); 556 - break; 557 - case OPA_EM_ATTR_VESWPORT_ERROR_COUNTERS: 558 - vema_get_error_counters(port, recvd_mad, rsp_mad); 559 - break; 560 - default: 561 - rsp_mad->mad_hdr.status = OPA_VNIC_UNSUP_ATTR; 562 - break; 563 - } 564 - } 565 - 566 - /** 567 - * vema_set -- Process received set MAD 568 - * @port: source port on which MAD was received 569 - * @recvd_mad: Received mad contains fields to set vnic parameters 570 - * @rsp_mad: Response mad to be built 571 - */ 572 - static void vema_set(struct opa_vnic_vema_port *port, 573 - struct opa_vnic_vema_mad *recvd_mad, 574 - struct opa_vnic_vema_mad *rsp_mad) 575 - { 576 - u16 attr_id = be16_to_cpu(recvd_mad->mad_hdr.attr_id); 577 - 578 - switch (attr_id) { 579 - case OPA_EM_ATTR_CLASS_PORT_INFO: 580 - vema_set_class_port_info(port, recvd_mad, rsp_mad); 581 - break; 582 - case OPA_EM_ATTR_VESWPORT_INFO: 583 - vema_set_veswport_info(port, recvd_mad, rsp_mad); 584 - break; 585 - case OPA_EM_ATTR_VESWPORT_MAC_ENTRIES: 586 - vema_set_mac_entries(port, recvd_mad, rsp_mad); 587 - break; 588 - case OPA_EM_ATTR_DELETE_VESW: 589 - vema_set_delete_vesw(port, recvd_mad, rsp_mad); 590 - break; 591 - default: 592 - rsp_mad->mad_hdr.status = OPA_VNIC_UNSUP_ATTR; 593 - break; 594 - } 595 - } 596 - 597 - /** 598 - * vema_send -- Send handler for VEMA MAD agent 599 - * @mad_agent: pointer to the mad agent 600 - * @mad_wc: pointer to mad send work completion information 601 - * 602 - * Free all the data structures associated with the sent MAD 603 - */ 604 - static void vema_send(struct ib_mad_agent *mad_agent, 605 - struct ib_mad_send_wc *mad_wc) 606 - { 607 - rdma_destroy_ah(mad_wc->send_buf->ah, RDMA_DESTROY_AH_SLEEPABLE); 608 - ib_free_send_mad(mad_wc->send_buf); 609 - } 610 - 611 - /** 612 - * vema_recv -- Recv handler for VEMA MAD agent 613 - * @mad_agent: pointer to the mad agent 614 - * @send_buf: Send buffer if found, else NULL 615 - * @mad_wc: pointer to mad send work completion information 616 - * 617 - * Handle only set and get methods and respond to other methods 618 - * as unsupported. Allocate response buffer and address handle 619 - * for the response MAD. 620 - */ 621 - static void vema_recv(struct ib_mad_agent *mad_agent, 622 - struct ib_mad_send_buf *send_buf, 623 - struct ib_mad_recv_wc *mad_wc) 624 - { 625 - struct opa_vnic_vema_port *port; 626 - struct ib_ah *ah; 627 - struct ib_mad_send_buf *rsp; 628 - struct opa_vnic_vema_mad *vema_mad; 629 - 630 - if (!mad_wc || !mad_wc->recv_buf.mad) 631 - return; 632 - 633 - port = mad_agent->context; 634 - ah = ib_create_ah_from_wc(mad_agent->qp->pd, mad_wc->wc, 635 - mad_wc->recv_buf.grh, mad_agent->port_num); 636 - if (IS_ERR(ah)) 637 - goto free_recv_mad; 638 - 639 - rsp = ib_create_send_mad(mad_agent, mad_wc->wc->src_qp, 640 - mad_wc->wc->pkey_index, 0, 641 - IB_MGMT_VENDOR_HDR, OPA_VNIC_EMA_DATA, 642 - GFP_KERNEL, OPA_MGMT_BASE_VERSION); 643 - if (IS_ERR(rsp)) 644 - goto err_rsp; 645 - 646 - rsp->ah = ah; 647 - vema_mad = rsp->mad; 648 - memcpy(vema_mad, mad_wc->recv_buf.mad, IB_MGMT_VENDOR_HDR); 649 - vema_mad->mad_hdr.method = IB_MGMT_METHOD_GET_RESP; 650 - vema_mad->mad_hdr.status = 0; 651 - 652 - /* Lock ensures network adapter is not removed */ 653 - mutex_lock(&port->lock); 654 - 655 - switch (mad_wc->recv_buf.mad->mad_hdr.method) { 656 - case IB_MGMT_METHOD_GET: 657 - vema_get(port, (struct opa_vnic_vema_mad *)mad_wc->recv_buf.mad, 658 - vema_mad); 659 - break; 660 - case IB_MGMT_METHOD_SET: 661 - vema_set(port, (struct opa_vnic_vema_mad *)mad_wc->recv_buf.mad, 662 - vema_mad); 663 - break; 664 - default: 665 - vema_mad->mad_hdr.status = OPA_VNIC_UNSUP_ATTR; 666 - break; 667 - } 668 - mutex_unlock(&port->lock); 669 - 670 - if (!ib_post_send_mad(rsp, NULL)) { 671 - /* 672 - * with post send successful ah and send mad 673 - * will be destroyed in send handler 674 - */ 675 - goto free_recv_mad; 676 - } 677 - 678 - ib_free_send_mad(rsp); 679 - 680 - err_rsp: 681 - rdma_destroy_ah(ah, RDMA_DESTROY_AH_SLEEPABLE); 682 - free_recv_mad: 683 - ib_free_recv_mad(mad_wc); 684 - } 685 - 686 - /** 687 - * vema_get_port -- Gets the opa_vnic_vema_port 688 - * @cport: pointer to control dev 689 - * @port_num: Port number 690 - * 691 - * This function loops through the ports and returns 692 - * the opa_vnic_vema port structure that is associated 693 - * with the OPA port number 694 - * 695 - * Return: ptr to requested opa_vnic_vema_port strucure 696 - * if success, NULL if not 697 - */ 698 - static struct opa_vnic_vema_port * 699 - vema_get_port(struct opa_vnic_ctrl_port *cport, u8 port_num) 700 - { 701 - struct opa_vnic_vema_port *port = (void *)cport + sizeof(*cport); 702 - 703 - if (port_num > cport->num_ports) 704 - return NULL; 705 - 706 - return port + (port_num - 1); 707 - } 708 - 709 - /** 710 - * opa_vnic_vema_send_trap -- This function sends a trap to the EM 711 - * @adapter: pointer to vnic adapter 712 - * @data: pointer to trap data filled by calling function 713 - * @lid: issuers lid (encap_slid from vesw_port_info) 714 - * 715 - * This function is called from the VNIC driver to send a trap if there 716 - * is somethng the EM should be notified about. These events currently 717 - * are 718 - * 1) UNICAST INTERFACE MACADDRESS changes 719 - * 2) MULTICAST INTERFACE MACADDRESS changes 720 - * 3) ETHERNET LINK STATUS changes 721 - * While allocating the send mad the remote site qpn used is 1 722 - * as this is the well known QP. 723 - * 724 - */ 725 - void opa_vnic_vema_send_trap(struct opa_vnic_adapter *adapter, 726 - struct __opa_veswport_trap *data, u32 lid) 727 - { 728 - struct opa_vnic_ctrl_port *cport = adapter->cport; 729 - struct ib_mad_send_buf *send_buf; 730 - struct opa_vnic_vema_port *port; 731 - struct ib_device *ibp; 732 - struct opa_vnic_vema_mad_trap *trap_mad; 733 - struct opa_class_port_info *class; 734 - struct rdma_ah_attr ah_attr; 735 - struct ib_ah *ah; 736 - struct opa_veswport_trap *trap; 737 - u32 trap_lid; 738 - u16 pkey_idx; 739 - 740 - if (!cport) 741 - goto err_exit; 742 - ibp = cport->ibdev; 743 - port = vema_get_port(cport, data->opaportnum); 744 - if (!port || !port->mad_agent) 745 - goto err_exit; 746 - 747 - if (time_before(jiffies, adapter->trap_timeout)) { 748 - if (adapter->trap_count == OPA_VNIC_TRAP_BURST_LIMIT) { 749 - v_warn("Trap rate exceeded\n"); 750 - goto err_exit; 751 - } else { 752 - adapter->trap_count++; 753 - } 754 - } else { 755 - adapter->trap_count = 0; 756 - } 757 - 758 - class = &port->class_port_info; 759 - /* Set up address handle */ 760 - memset(&ah_attr, 0, sizeof(ah_attr)); 761 - ah_attr.type = rdma_ah_find_type(ibp, port->port_num); 762 - rdma_ah_set_sl(&ah_attr, 763 - GET_TRAP_SL_FROM_CLASS_PORT_INFO(class->trap_sl_rsvd)); 764 - rdma_ah_set_port_num(&ah_attr, port->port_num); 765 - trap_lid = be32_to_cpu(class->trap_lid); 766 - /* 767 - * check for trap lid validity, must not be zero 768 - * The trap sink could change after we fashion the MAD but since traps 769 - * are not guaranteed we won't use a lock as anyway the change will take 770 - * place even with locking. 771 - */ 772 - if (!trap_lid) { 773 - c_err("%s: Invalid dlid\n", __func__); 774 - goto err_exit; 775 - } 776 - 777 - rdma_ah_set_dlid(&ah_attr, trap_lid); 778 - ah = rdma_create_ah(port->mad_agent->qp->pd, &ah_attr, 0); 779 - if (IS_ERR(ah)) { 780 - c_err("%s:Couldn't create new AH = %p\n", __func__, ah); 781 - c_err("%s:dlid = %d, sl = %d, port = %d\n", __func__, 782 - rdma_ah_get_dlid(&ah_attr), rdma_ah_get_sl(&ah_attr), 783 - rdma_ah_get_port_num(&ah_attr)); 784 - goto err_exit; 785 - } 786 - 787 - if (ib_find_pkey(ibp, data->opaportnum, IB_DEFAULT_PKEY_FULL, 788 - &pkey_idx) < 0) { 789 - c_err("%s:full key not found, defaulting to partial\n", 790 - __func__); 791 - if (ib_find_pkey(ibp, data->opaportnum, IB_DEFAULT_PKEY_PARTIAL, 792 - &pkey_idx) < 0) 793 - pkey_idx = 1; 794 - } 795 - 796 - send_buf = ib_create_send_mad(port->mad_agent, 1, pkey_idx, 0, 797 - IB_MGMT_VENDOR_HDR, IB_MGMT_MAD_DATA, 798 - GFP_ATOMIC, OPA_MGMT_BASE_VERSION); 799 - if (IS_ERR(send_buf)) { 800 - c_err("%s:Couldn't allocate send buf\n", __func__); 801 - goto err_sndbuf; 802 - } 803 - 804 - send_buf->ah = ah; 805 - 806 - /* Set up common MAD hdr */ 807 - trap_mad = send_buf->mad; 808 - trap_mad->mad_hdr.base_version = OPA_MGMT_BASE_VERSION; 809 - trap_mad->mad_hdr.mgmt_class = OPA_MGMT_CLASS_INTEL_EMA; 810 - trap_mad->mad_hdr.class_version = OPA_EMA_CLASS_VERSION; 811 - trap_mad->mad_hdr.method = IB_MGMT_METHOD_TRAP; 812 - port->tid++; 813 - trap_mad->mad_hdr.tid = cpu_to_be64(port->tid); 814 - trap_mad->mad_hdr.attr_id = IB_SMP_ATTR_NOTICE; 815 - 816 - /* Set up vendor OUI */ 817 - trap_mad->oui[0] = INTEL_OUI_1; 818 - trap_mad->oui[1] = INTEL_OUI_2; 819 - trap_mad->oui[2] = INTEL_OUI_3; 820 - 821 - /* Setup notice attribute portion */ 822 - trap_mad->notice.gen_type = OPA_INTEL_EMA_NOTICE_TYPE_INFO << 1; 823 - trap_mad->notice.oui_1 = INTEL_OUI_1; 824 - trap_mad->notice.oui_2 = INTEL_OUI_2; 825 - trap_mad->notice.oui_3 = INTEL_OUI_3; 826 - trap_mad->notice.issuer_lid = cpu_to_be32(lid); 827 - 828 - /* copy the actual trap data */ 829 - trap = (struct opa_veswport_trap *)trap_mad->notice.raw_data; 830 - trap->fabric_id = cpu_to_be16(data->fabric_id); 831 - trap->veswid = cpu_to_be16(data->veswid); 832 - trap->veswportnum = cpu_to_be32(data->veswportnum); 833 - trap->opaportnum = cpu_to_be16(data->opaportnum); 834 - trap->veswportindex = data->veswportindex; 835 - trap->opcode = data->opcode; 836 - 837 - /* If successful send set up rate limit timeout else bail */ 838 - if (ib_post_send_mad(send_buf, NULL)) { 839 - ib_free_send_mad(send_buf); 840 - } else { 841 - if (adapter->trap_count) 842 - return; 843 - adapter->trap_timeout = jiffies + 844 - usecs_to_jiffies(OPA_VNIC_TRAP_TIMEOUT); 845 - return; 846 - } 847 - 848 - err_sndbuf: 849 - rdma_destroy_ah(ah, 0); 850 - err_exit: 851 - v_err("Aborting trap\n"); 852 - } 853 - 854 - static void opa_vnic_event(struct ib_event_handler *handler, 855 - struct ib_event *record) 856 - { 857 - struct opa_vnic_vema_port *port = 858 - container_of(handler, struct opa_vnic_vema_port, event_handler); 859 - struct opa_vnic_ctrl_port *cport = port->cport; 860 - struct opa_vnic_adapter *adapter; 861 - unsigned long index; 862 - 863 - if (record->element.port_num != port->port_num) 864 - return; 865 - 866 - c_dbg("OPA_VNIC received event %d on device %s port %d\n", 867 - record->event, dev_name(&record->device->dev), 868 - record->element.port_num); 869 - 870 - if (record->event != IB_EVENT_PORT_ERR && 871 - record->event != IB_EVENT_PORT_ACTIVE) 872 - return; 873 - 874 - xa_for_each(&port->vports, index, adapter) { 875 - if (record->event == IB_EVENT_PORT_ACTIVE) 876 - netif_carrier_on(adapter->netdev); 877 - else 878 - netif_carrier_off(adapter->netdev); 879 - } 880 - } 881 - 882 - /** 883 - * vema_unregister -- Unregisters agent 884 - * @cport: pointer to control port 885 - * 886 - * This deletes the registration by VEMA for MADs 887 - */ 888 - static void vema_unregister(struct opa_vnic_ctrl_port *cport) 889 - { 890 - struct opa_vnic_adapter *adapter; 891 - unsigned long index; 892 - int i; 893 - 894 - for (i = 1; i <= cport->num_ports; i++) { 895 - struct opa_vnic_vema_port *port = vema_get_port(cport, i); 896 - 897 - if (!port->mad_agent) 898 - continue; 899 - 900 - /* Lock ensures no MAD is being processed */ 901 - mutex_lock(&port->lock); 902 - xa_for_each(&port->vports, index, adapter) 903 - opa_vnic_rem_netdev(adapter); 904 - mutex_unlock(&port->lock); 905 - 906 - ib_unregister_mad_agent(port->mad_agent); 907 - port->mad_agent = NULL; 908 - mutex_destroy(&port->lock); 909 - xa_destroy(&port->vports); 910 - ib_unregister_event_handler(&port->event_handler); 911 - } 912 - } 913 - 914 - /** 915 - * vema_register -- Registers agent 916 - * @cport: pointer to control port 917 - * 918 - * This function registers the handlers for the VEMA MADs 919 - * 920 - * Return: returns 0 on success. non zero otherwise 921 - */ 922 - static int vema_register(struct opa_vnic_ctrl_port *cport) 923 - { 924 - struct ib_mad_reg_req reg_req = { 925 - .mgmt_class = OPA_MGMT_CLASS_INTEL_EMA, 926 - .mgmt_class_version = OPA_MGMT_BASE_VERSION, 927 - .oui = { INTEL_OUI_1, INTEL_OUI_2, INTEL_OUI_3 } 928 - }; 929 - int i; 930 - 931 - set_bit(IB_MGMT_METHOD_GET, reg_req.method_mask); 932 - set_bit(IB_MGMT_METHOD_SET, reg_req.method_mask); 933 - 934 - /* register ib event handler and mad agent for each port on dev */ 935 - for (i = 1; i <= cport->num_ports; i++) { 936 - struct opa_vnic_vema_port *port = vema_get_port(cport, i); 937 - int ret; 938 - 939 - port->cport = cport; 940 - port->port_num = i; 941 - 942 - INIT_IB_EVENT_HANDLER(&port->event_handler, 943 - cport->ibdev, opa_vnic_event); 944 - ib_register_event_handler(&port->event_handler); 945 - 946 - xa_init(&port->vports); 947 - mutex_init(&port->lock); 948 - port->mad_agent = ib_register_mad_agent(cport->ibdev, i, 949 - IB_QPT_GSI, &reg_req, 950 - IB_MGMT_RMPP_VERSION, 951 - vema_send, vema_recv, 952 - port, 0); 953 - if (IS_ERR(port->mad_agent)) { 954 - ret = PTR_ERR(port->mad_agent); 955 - port->mad_agent = NULL; 956 - mutex_destroy(&port->lock); 957 - vema_unregister(cport); 958 - return ret; 959 - } 960 - } 961 - 962 - return 0; 963 - } 964 - 965 - /** 966 - * opa_vnic_ctrl_config_dev -- This function sends a trap to the EM 967 - * by way of ib_modify_port to indicate support for ethernet on the 968 - * fabric. 969 - * @cport: pointer to control port 970 - * @en: enable or disable ethernet on fabric support 971 - */ 972 - static void opa_vnic_ctrl_config_dev(struct opa_vnic_ctrl_port *cport, bool en) 973 - { 974 - struct ib_port_modify pm = { 0 }; 975 - int i; 976 - 977 - if (en) 978 - pm.set_port_cap_mask = OPA_CAP_MASK3_IsEthOnFabricSupported; 979 - else 980 - pm.clr_port_cap_mask = OPA_CAP_MASK3_IsEthOnFabricSupported; 981 - 982 - for (i = 1; i <= cport->num_ports; i++) 983 - ib_modify_port(cport->ibdev, i, IB_PORT_OPA_MASK_CHG, &pm); 984 - } 985 - 986 - /** 987 - * opa_vnic_vema_add_one -- Handle new ib device 988 - * @device: ib device pointer 989 - * 990 - * Allocate the vnic control port and initialize it. 991 - */ 992 - static int opa_vnic_vema_add_one(struct ib_device *device) 993 - { 994 - struct opa_vnic_ctrl_port *cport; 995 - int rc, size = sizeof(*cport); 996 - 997 - if (!rdma_cap_opa_vnic(device)) 998 - return -EOPNOTSUPP; 999 - 1000 - size += device->phys_port_cnt * sizeof(struct opa_vnic_vema_port); 1001 - cport = kzalloc(size, GFP_KERNEL); 1002 - if (!cport) 1003 - return -ENOMEM; 1004 - 1005 - cport->num_ports = device->phys_port_cnt; 1006 - cport->ibdev = device; 1007 - 1008 - /* Initialize opa vnic management agent (vema) */ 1009 - rc = vema_register(cport); 1010 - if (!rc) 1011 - c_info("VNIC client initialized\n"); 1012 - 1013 - ib_set_client_data(device, &opa_vnic_client, cport); 1014 - opa_vnic_ctrl_config_dev(cport, true); 1015 - return 0; 1016 - } 1017 - 1018 - /** 1019 - * opa_vnic_vema_rem_one -- Handle ib device removal 1020 - * @device: ib device pointer 1021 - * @client_data: ib client data 1022 - * 1023 - * Uninitialize and free the vnic control port. 1024 - */ 1025 - static void opa_vnic_vema_rem_one(struct ib_device *device, 1026 - void *client_data) 1027 - { 1028 - struct opa_vnic_ctrl_port *cport = client_data; 1029 - 1030 - c_info("removing VNIC client\n"); 1031 - opa_vnic_ctrl_config_dev(cport, false); 1032 - vema_unregister(cport); 1033 - kfree(cport); 1034 - } 1035 - 1036 - static int __init opa_vnic_init(void) 1037 - { 1038 - int rc; 1039 - 1040 - rc = ib_register_client(&opa_vnic_client); 1041 - if (rc) 1042 - pr_err("VNIC driver register failed %d\n", rc); 1043 - 1044 - return rc; 1045 - } 1046 - module_init(opa_vnic_init); 1047 - 1048 - static void opa_vnic_deinit(void) 1049 - { 1050 - ib_unregister_client(&opa_vnic_client); 1051 - } 1052 - module_exit(opa_vnic_deinit); 1053 - 1054 - MODULE_LICENSE("Dual BSD/GPL"); 1055 - MODULE_AUTHOR("Cornelis Networks"); 1056 - MODULE_DESCRIPTION("Cornelis OPX Virtual Network driver");

-390

drivers/infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c

··· 1 - /* 2 - * Copyright(c) 2017 Intel Corporation. 3 - * 4 - * This file is provided under a dual BSD/GPLv2 license. When using or 5 - * redistributing this file, you may do so under either license. 6 - * 7 - * GPL LICENSE SUMMARY 8 - * 9 - * This program is free software; you can redistribute it and/or modify 10 - * it under the terms of version 2 of the GNU General Public License as 11 - * published by the Free Software Foundation. 12 - * 13 - * This program is distributed in the hope that it will be useful, but 14 - * WITHOUT ANY WARRANTY; without even the implied warranty of 15 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 16 - * General Public License for more details. 17 - * 18 - * BSD LICENSE 19 - * 20 - * Redistribution and use in source and binary forms, with or without 21 - * modification, are permitted provided that the following conditions 22 - * are met: 23 - * 24 - * - Redistributions of source code must retain the above copyright 25 - * notice, this list of conditions and the following disclaimer. 26 - * - Redistributions in binary form must reproduce the above copyright 27 - * notice, this list of conditions and the following disclaimer in 28 - * the documentation and/or other materials provided with the 29 - * distribution. 30 - * - Neither the name of Intel Corporation nor the names of its 31 - * contributors may be used to endorse or promote products derived 32 - * from this software without specific prior written permission. 33 - * 34 - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 35 - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 36 - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 37 - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 38 - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 39 - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 40 - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 41 - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 42 - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 43 - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 44 - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 45 - * 46 - */ 47 - 48 - /* 49 - * This file contains OPA VNIC EMA Interface functions. 50 - */ 51 - 52 - #include "opa_vnic_internal.h" 53 - 54 - /** 55 - * opa_vnic_vema_report_event - sent trap to report the specified event 56 - * @adapter: vnic port adapter 57 - * @event: event to be reported 58 - * 59 - * This function calls vema api to sent a trap for the given event. 60 - */ 61 - void opa_vnic_vema_report_event(struct opa_vnic_adapter *adapter, u8 event) 62 - { 63 - struct __opa_veswport_info *info = &adapter->info; 64 - struct __opa_veswport_trap trap_data; 65 - 66 - trap_data.fabric_id = info->vesw.fabric_id; 67 - trap_data.veswid = info->vesw.vesw_id; 68 - trap_data.veswportnum = info->vport.port_num; 69 - trap_data.opaportnum = adapter->port_num; 70 - trap_data.veswportindex = adapter->vport_num; 71 - trap_data.opcode = event; 72 - 73 - opa_vnic_vema_send_trap(adapter, &trap_data, info->vport.encap_slid); 74 - } 75 - 76 - /** 77 - * opa_vnic_get_summary_counters - get summary counters 78 - * @adapter: vnic port adapter 79 - * @cntrs: pointer to destination summary counters structure 80 - * 81 - * This function populates the summary counters that is maintained by the 82 - * given adapter to destination address provided. 83 - */ 84 - void opa_vnic_get_summary_counters(struct opa_vnic_adapter *adapter, 85 - struct opa_veswport_summary_counters *cntrs) 86 - { 87 - struct opa_vnic_stats vstats; 88 - __be64 *dst; 89 - u64 *src; 90 - 91 - memset(&vstats, 0, sizeof(vstats)); 92 - spin_lock(&adapter->stats_lock); 93 - adapter->rn_ops->ndo_get_stats64(adapter->netdev, &vstats.netstats); 94 - spin_unlock(&adapter->stats_lock); 95 - 96 - cntrs->vp_instance = cpu_to_be16(adapter->vport_num); 97 - cntrs->vesw_id = cpu_to_be16(adapter->info.vesw.vesw_id); 98 - cntrs->veswport_num = cpu_to_be32(adapter->port_num); 99 - 100 - cntrs->tx_errors = cpu_to_be64(vstats.netstats.tx_errors); 101 - cntrs->rx_errors = cpu_to_be64(vstats.netstats.rx_errors); 102 - cntrs->tx_packets = cpu_to_be64(vstats.netstats.tx_packets); 103 - cntrs->rx_packets = cpu_to_be64(vstats.netstats.rx_packets); 104 - cntrs->tx_bytes = cpu_to_be64(vstats.netstats.tx_bytes); 105 - cntrs->rx_bytes = cpu_to_be64(vstats.netstats.rx_bytes); 106 - 107 - /* 108 - * This loop depends on layout of 109 - * opa_veswport_summary_counters opa_vnic_stats structures. 110 - */ 111 - for (dst = &cntrs->tx_unicast, src = &vstats.tx_grp.unicast; 112 - dst < &cntrs->reserved[0]; dst++, src++) { 113 - *dst = cpu_to_be64(*src); 114 - } 115 - } 116 - 117 - /** 118 - * opa_vnic_get_error_counters - get error counters 119 - * @adapter: vnic port adapter 120 - * @cntrs: pointer to destination error counters structure 121 - * 122 - * This function populates the error counters that is maintained by the 123 - * given adapter to destination address provided. 124 - */ 125 - void opa_vnic_get_error_counters(struct opa_vnic_adapter *adapter, 126 - struct opa_veswport_error_counters *cntrs) 127 - { 128 - struct opa_vnic_stats vstats; 129 - 130 - memset(&vstats, 0, sizeof(vstats)); 131 - spin_lock(&adapter->stats_lock); 132 - adapter->rn_ops->ndo_get_stats64(adapter->netdev, &vstats.netstats); 133 - spin_unlock(&adapter->stats_lock); 134 - 135 - cntrs->vp_instance = cpu_to_be16(adapter->vport_num); 136 - cntrs->vesw_id = cpu_to_be16(adapter->info.vesw.vesw_id); 137 - cntrs->veswport_num = cpu_to_be32(adapter->port_num); 138 - 139 - cntrs->tx_errors = cpu_to_be64(vstats.netstats.tx_errors); 140 - cntrs->rx_errors = cpu_to_be64(vstats.netstats.rx_errors); 141 - cntrs->tx_dlid_zero = cpu_to_be64(vstats.tx_dlid_zero); 142 - cntrs->tx_drop_state = cpu_to_be64(vstats.tx_drop_state); 143 - cntrs->tx_logic = cpu_to_be64(vstats.netstats.tx_fifo_errors + 144 - vstats.netstats.tx_carrier_errors); 145 - 146 - cntrs->rx_bad_veswid = cpu_to_be64(vstats.netstats.rx_nohandler); 147 - cntrs->rx_runt = cpu_to_be64(vstats.rx_runt); 148 - cntrs->rx_oversize = cpu_to_be64(vstats.rx_oversize); 149 - cntrs->rx_drop_state = cpu_to_be64(vstats.rx_drop_state); 150 - cntrs->rx_logic = cpu_to_be64(vstats.netstats.rx_fifo_errors); 151 - } 152 - 153 - /** 154 - * opa_vnic_get_vesw_info -- Get the vesw information 155 - * @adapter: vnic port adapter 156 - * @info: pointer to destination vesw info structure 157 - * 158 - * This function copies the vesw info that is maintained by the 159 - * given adapter to destination address provided. 160 - */ 161 - void opa_vnic_get_vesw_info(struct opa_vnic_adapter *adapter, 162 - struct opa_vesw_info *info) 163 - { 164 - struct __opa_vesw_info *src = &adapter->info.vesw; 165 - int i; 166 - 167 - info->fabric_id = cpu_to_be16(src->fabric_id); 168 - info->vesw_id = cpu_to_be16(src->vesw_id); 169 - memcpy(info->rsvd0, src->rsvd0, ARRAY_SIZE(src->rsvd0)); 170 - info->def_port_mask = cpu_to_be16(src->def_port_mask); 171 - memcpy(info->rsvd1, src->rsvd1, ARRAY_SIZE(src->rsvd1)); 172 - info->pkey = cpu_to_be16(src->pkey); 173 - 174 - memcpy(info->rsvd2, src->rsvd2, ARRAY_SIZE(src->rsvd2)); 175 - info->u_mcast_dlid = cpu_to_be32(src->u_mcast_dlid); 176 - for (i = 0; i < OPA_VESW_MAX_NUM_DEF_PORT; i++) 177 - info->u_ucast_dlid[i] = cpu_to_be32(src->u_ucast_dlid[i]); 178 - 179 - info->rc = cpu_to_be32(src->rc); 180 - 181 - memcpy(info->rsvd3, src->rsvd3, ARRAY_SIZE(src->rsvd3)); 182 - info->eth_mtu = cpu_to_be16(src->eth_mtu); 183 - memcpy(info->rsvd4, src->rsvd4, ARRAY_SIZE(src->rsvd4)); 184 - } 185 - 186 - /** 187 - * opa_vnic_set_vesw_info -- Set the vesw information 188 - * @adapter: vnic port adapter 189 - * @info: pointer to vesw info structure 190 - * 191 - * This function updates the vesw info that is maintained by the 192 - * given adapter with vesw info provided. Reserved fields are stored 193 - * and returned back to EM as is. 194 - */ 195 - void opa_vnic_set_vesw_info(struct opa_vnic_adapter *adapter, 196 - struct opa_vesw_info *info) 197 - { 198 - struct __opa_vesw_info *dst = &adapter->info.vesw; 199 - int i; 200 - 201 - dst->fabric_id = be16_to_cpu(info->fabric_id); 202 - dst->vesw_id = be16_to_cpu(info->vesw_id); 203 - memcpy(dst->rsvd0, info->rsvd0, ARRAY_SIZE(info->rsvd0)); 204 - dst->def_port_mask = be16_to_cpu(info->def_port_mask); 205 - memcpy(dst->rsvd1, info->rsvd1, ARRAY_SIZE(info->rsvd1)); 206 - dst->pkey = be16_to_cpu(info->pkey); 207 - 208 - memcpy(dst->rsvd2, info->rsvd2, ARRAY_SIZE(info->rsvd2)); 209 - dst->u_mcast_dlid = be32_to_cpu(info->u_mcast_dlid); 210 - for (i = 0; i < OPA_VESW_MAX_NUM_DEF_PORT; i++) 211 - dst->u_ucast_dlid[i] = be32_to_cpu(info->u_ucast_dlid[i]); 212 - 213 - dst->rc = be32_to_cpu(info->rc); 214 - 215 - memcpy(dst->rsvd3, info->rsvd3, ARRAY_SIZE(info->rsvd3)); 216 - dst->eth_mtu = be16_to_cpu(info->eth_mtu); 217 - memcpy(dst->rsvd4, info->rsvd4, ARRAY_SIZE(info->rsvd4)); 218 - } 219 - 220 - /** 221 - * opa_vnic_get_per_veswport_info -- Get the vesw per port information 222 - * @adapter: vnic port adapter 223 - * @info: pointer to destination vport info structure 224 - * 225 - * This function copies the vesw per port info that is maintained by the 226 - * given adapter to destination address provided. 227 - * Note that the read only fields are not copied. 228 - */ 229 - void opa_vnic_get_per_veswport_info(struct opa_vnic_adapter *adapter, 230 - struct opa_per_veswport_info *info) 231 - { 232 - struct __opa_per_veswport_info *src = &adapter->info.vport; 233 - 234 - info->port_num = cpu_to_be32(src->port_num); 235 - info->eth_link_status = src->eth_link_status; 236 - memcpy(info->rsvd0, src->rsvd0, ARRAY_SIZE(src->rsvd0)); 237 - 238 - memcpy(info->base_mac_addr, src->base_mac_addr, 239 - ARRAY_SIZE(info->base_mac_addr)); 240 - info->config_state = src->config_state; 241 - info->oper_state = src->oper_state; 242 - info->max_mac_tbl_ent = cpu_to_be16(src->max_mac_tbl_ent); 243 - info->max_smac_ent = cpu_to_be16(src->max_smac_ent); 244 - info->mac_tbl_digest = cpu_to_be32(src->mac_tbl_digest); 245 - memcpy(info->rsvd1, src->rsvd1, ARRAY_SIZE(src->rsvd1)); 246 - 247 - info->encap_slid = cpu_to_be32(src->encap_slid); 248 - memcpy(info->pcp_to_sc_uc, src->pcp_to_sc_uc, 249 - ARRAY_SIZE(info->pcp_to_sc_uc)); 250 - memcpy(info->pcp_to_vl_uc, src->pcp_to_vl_uc, 251 - ARRAY_SIZE(info->pcp_to_vl_uc)); 252 - memcpy(info->pcp_to_sc_mc, src->pcp_to_sc_mc, 253 - ARRAY_SIZE(info->pcp_to_sc_mc)); 254 - memcpy(info->pcp_to_vl_mc, src->pcp_to_vl_mc, 255 - ARRAY_SIZE(info->pcp_to_vl_mc)); 256 - info->non_vlan_sc_uc = src->non_vlan_sc_uc; 257 - info->non_vlan_vl_uc = src->non_vlan_vl_uc; 258 - info->non_vlan_sc_mc = src->non_vlan_sc_mc; 259 - info->non_vlan_vl_mc = src->non_vlan_vl_mc; 260 - memcpy(info->rsvd2, src->rsvd2, ARRAY_SIZE(src->rsvd2)); 261 - 262 - info->uc_macs_gen_count = cpu_to_be16(src->uc_macs_gen_count); 263 - info->mc_macs_gen_count = cpu_to_be16(src->mc_macs_gen_count); 264 - memcpy(info->rsvd3, src->rsvd3, ARRAY_SIZE(src->rsvd3)); 265 - } 266 - 267 - /** 268 - * opa_vnic_set_per_veswport_info -- Set vesw per port information 269 - * @adapter: vnic port adapter 270 - * @info: pointer to vport info structure 271 - * 272 - * This function updates the vesw per port info that is maintained by the 273 - * given adapter with vesw per port info provided. Reserved fields are 274 - * stored and returned back to EM as is. 275 - */ 276 - void opa_vnic_set_per_veswport_info(struct opa_vnic_adapter *adapter, 277 - struct opa_per_veswport_info *info) 278 - { 279 - struct __opa_per_veswport_info *dst = &adapter->info.vport; 280 - 281 - dst->port_num = be32_to_cpu(info->port_num); 282 - memcpy(dst->rsvd0, info->rsvd0, ARRAY_SIZE(info->rsvd0)); 283 - 284 - memcpy(dst->base_mac_addr, info->base_mac_addr, 285 - ARRAY_SIZE(dst->base_mac_addr)); 286 - dst->config_state = info->config_state; 287 - memcpy(dst->rsvd1, info->rsvd1, ARRAY_SIZE(info->rsvd1)); 288 - 289 - dst->encap_slid = be32_to_cpu(info->encap_slid); 290 - memcpy(dst->pcp_to_sc_uc, info->pcp_to_sc_uc, 291 - ARRAY_SIZE(dst->pcp_to_sc_uc)); 292 - memcpy(dst->pcp_to_vl_uc, info->pcp_to_vl_uc, 293 - ARRAY_SIZE(dst->pcp_to_vl_uc)); 294 - memcpy(dst->pcp_to_sc_mc, info->pcp_to_sc_mc, 295 - ARRAY_SIZE(dst->pcp_to_sc_mc)); 296 - memcpy(dst->pcp_to_vl_mc, info->pcp_to_vl_mc, 297 - ARRAY_SIZE(dst->pcp_to_vl_mc)); 298 - dst->non_vlan_sc_uc = info->non_vlan_sc_uc; 299 - dst->non_vlan_vl_uc = info->non_vlan_vl_uc; 300 - dst->non_vlan_sc_mc = info->non_vlan_sc_mc; 301 - dst->non_vlan_vl_mc = info->non_vlan_vl_mc; 302 - memcpy(dst->rsvd2, info->rsvd2, ARRAY_SIZE(info->rsvd2)); 303 - memcpy(dst->rsvd3, info->rsvd3, ARRAY_SIZE(info->rsvd3)); 304 - } 305 - 306 - /** 307 - * opa_vnic_query_mcast_macs - query multicast mac list 308 - * @adapter: vnic port adapter 309 - * @macs: pointer mac list 310 - * 311 - * This function populates the provided mac list with the configured 312 - * multicast addresses in the adapter. 313 - */ 314 - void opa_vnic_query_mcast_macs(struct opa_vnic_adapter *adapter, 315 - struct opa_veswport_iface_macs *macs) 316 - { 317 - u16 start_idx, num_macs, idx = 0, count = 0; 318 - struct netdev_hw_addr *ha; 319 - 320 - start_idx = be16_to_cpu(macs->start_idx); 321 - num_macs = be16_to_cpu(macs->num_macs_in_msg); 322 - netdev_for_each_mc_addr(ha, adapter->netdev) { 323 - struct opa_vnic_iface_mac_entry *entry = &macs->entry[count]; 324 - 325 - if (start_idx > idx++) 326 - continue; 327 - else if (num_macs == count) 328 - break; 329 - memcpy(entry, ha->addr, sizeof(*entry)); 330 - count++; 331 - } 332 - 333 - macs->tot_macs_in_lst = cpu_to_be16(netdev_mc_count(adapter->netdev)); 334 - macs->num_macs_in_msg = cpu_to_be16(count); 335 - macs->gen_count = cpu_to_be16(adapter->info.vport.mc_macs_gen_count); 336 - } 337 - 338 - /** 339 - * opa_vnic_query_ucast_macs - query unicast mac list 340 - * @adapter: vnic port adapter 341 - * @macs: pointer mac list 342 - * 343 - * This function populates the provided mac list with the configured 344 - * unicast addresses in the adapter. 345 - */ 346 - void opa_vnic_query_ucast_macs(struct opa_vnic_adapter *adapter, 347 - struct opa_veswport_iface_macs *macs) 348 - { 349 - u16 start_idx, tot_macs, num_macs, idx = 0, count = 0, em_macs = 0; 350 - struct netdev_hw_addr *ha; 351 - 352 - start_idx = be16_to_cpu(macs->start_idx); 353 - num_macs = be16_to_cpu(macs->num_macs_in_msg); 354 - /* loop through dev_addrs list first */ 355 - for_each_dev_addr(adapter->netdev, ha) { 356 - struct opa_vnic_iface_mac_entry *entry = &macs->entry[count]; 357 - 358 - /* Do not include EM specified MAC address */ 359 - if (!memcmp(adapter->info.vport.base_mac_addr, ha->addr, 360 - ARRAY_SIZE(adapter->info.vport.base_mac_addr))) { 361 - em_macs++; 362 - continue; 363 - } 364 - 365 - if (start_idx > idx++) 366 - continue; 367 - else if (num_macs == count) 368 - break; 369 - memcpy(entry, ha->addr, sizeof(*entry)); 370 - count++; 371 - } 372 - 373 - /* loop through uc list */ 374 - netdev_for_each_uc_addr(ha, adapter->netdev) { 375 - struct opa_vnic_iface_mac_entry *entry = &macs->entry[count]; 376 - 377 - if (start_idx > idx++) 378 - continue; 379 - else if (num_macs == count) 380 - break; 381 - memcpy(entry, ha->addr, sizeof(*entry)); 382 - count++; 383 - } 384 - 385 - tot_macs = netdev_hw_addr_list_count(&adapter->netdev->dev_addrs) + 386 - netdev_uc_count(adapter->netdev) - em_macs; 387 - macs->tot_macs_in_lst = cpu_to_be16(tot_macs); 388 - macs->num_macs_in_msg = cpu_to_be16(count); 389 - macs->gen_count = cpu_to_be16(adapter->info.vport.uc_macs_gen_count); 390 - }

+1 -1

drivers/infiniband/ulp/rtrs/rtrs-clt.c

··· 3219 3219 pr_err("Failed to create rtrs-client dev class\n"); 3220 3220 return ret; 3221 3221 } 3222 - rtrs_wq = alloc_workqueue("rtrs_client_wq", 0, 0); 3222 + rtrs_wq = alloc_workqueue("rtrs_client_wq", WQ_PERCPU, 0); 3223 3223 if (!rtrs_wq) { 3224 3224 class_unregister(&rtrs_clt_dev_class); 3225 3225 return -ENOMEM;

+1 -1

drivers/infiniband/ulp/rtrs/rtrs-srv.c

··· 2385 2385 if (err) 2386 2386 goto out_err; 2387 2387 2388 - rtrs_wq = alloc_workqueue("rtrs_server_wq", 0, 0); 2388 + rtrs_wq = alloc_workqueue("rtrs_server_wq", WQ_PERCPU, 0); 2389 2389 if (!rtrs_wq) { 2390 2390 err = -ENOMEM; 2391 2391 goto out_dev_class;

+1 -66

drivers/net/ethernet/mellanox/mlx5/core/main.c

··· 111 111 112 112 }, 113 113 [2] = { 114 - .mask = MLX5_PROF_MASK_QP_SIZE | 115 - MLX5_PROF_MASK_MR_CACHE, 114 + .mask = MLX5_PROF_MASK_QP_SIZE, 116 115 .log_max_qp = LOG_MAX_SUPPORTED_QPS, 117 116 .num_cmd_caches = MLX5_NUM_COMMAND_CACHES, 118 - .mr_cache[0] = { 119 - .size = 500, 120 - .limit = 250 121 - }, 122 - .mr_cache[1] = { 123 - .size = 500, 124 - .limit = 250 125 - }, 126 - .mr_cache[2] = { 127 - .size = 500, 128 - .limit = 250 129 - }, 130 - .mr_cache[3] = { 131 - .size = 500, 132 - .limit = 250 133 - }, 134 - .mr_cache[4] = { 135 - .size = 500, 136 - .limit = 250 137 - }, 138 - .mr_cache[5] = { 139 - .size = 500, 140 - .limit = 250 141 - }, 142 - .mr_cache[6] = { 143 - .size = 500, 144 - .limit = 250 145 - }, 146 - .mr_cache[7] = { 147 - .size = 500, 148 - .limit = 250 149 - }, 150 - .mr_cache[8] = { 151 - .size = 500, 152 - .limit = 250 153 - }, 154 - .mr_cache[9] = { 155 - .size = 500, 156 - .limit = 250 157 - }, 158 - .mr_cache[10] = { 159 - .size = 500, 160 - .limit = 250 161 - }, 162 - .mr_cache[11] = { 163 - .size = 500, 164 - .limit = 250 165 - }, 166 - .mr_cache[12] = { 167 - .size = 64, 168 - .limit = 32 169 - }, 170 - .mr_cache[13] = { 171 - .size = 32, 172 - .limit = 16 173 - }, 174 - .mr_cache[14] = { 175 - .size = 16, 176 - .limit = 8 177 - }, 178 - .mr_cache[15] = { 179 - .size = 8, 180 - .limit = 4 181 - }, 182 117 }, 183 118 [3] = { 184 119 .mask = MLX5_PROF_MASK_QP_SIZE,

+10 -1

drivers/net/ethernet/microsoft/mana/mana_en.c

··· 2926 2926 ethtool_rxfh_indir_default(i, apc->num_queues); 2927 2927 } 2928 2928 2929 + int mana_disable_vport_rx(struct mana_port_context *apc) 2930 + { 2931 + return mana_cfg_vport_steering(apc, TRI_STATE_FALSE, false, false, 2932 + false); 2933 + } 2934 + EXPORT_SYMBOL_NS(mana_disable_vport_rx, "NET_MANA"); 2935 + 2929 2936 int mana_config_rss(struct mana_port_context *apc, enum TRI_STATE rx, 2930 2937 bool update_hash, bool update_tab) 2931 2938 { ··· 3319 3312 */ 3320 3313 3321 3314 apc->rss_state = TRI_STATE_FALSE; 3322 - err = mana_config_rss(apc, TRI_STATE_FALSE, false, false); 3315 + err = mana_disable_vport_rx(apc); 3323 3316 if (err && mana_en_need_log(apc, err)) 3324 3317 netdev_err(ndev, "Failed to disable vPort: %d\n", err); 3318 + 3319 + mana_fence_rqs(apc); 3325 3320 3326 3321 /* Even in err case, still need to cleanup the vPort */ 3327 3322 mana_destroy_vport(apc);

-11

include/linux/mlx5/driver.h

··· 705 705 706 706 enum { 707 707 MLX5_PROF_MASK_QP_SIZE = (u64)1 << 0, 708 - MLX5_PROF_MASK_MR_CACHE = (u64)1 << 1, 709 - }; 710 - 711 - enum { 712 - MKEY_CACHE_LAST_STD_ENTRY = 20, 713 - MLX5_IMR_KSM_CACHE_ENTRY, 714 - MAX_MKEY_CACHE_ENTRIES 715 708 }; 716 709 717 710 struct mlx5_profile { 718 711 u64 mask; 719 712 u8 log_max_qp; 720 713 u8 num_cmd_caches; 721 - struct { 722 - int size; 723 - int limit; 724 - } mr_cache[MAX_MKEY_CACHE_ENTRIES]; 725 714 }; 726 715 727 716 struct mlx5_hca_cap {

+5

include/net/mana/gdma.h

··· 792 792 GDMA_ACCESS_FLAG_REMOTE_READ = BIT_ULL(2), 793 793 GDMA_ACCESS_FLAG_REMOTE_WRITE = BIT_ULL(3), 794 794 GDMA_ACCESS_FLAG_REMOTE_ATOMIC = BIT_ULL(4), 795 + GDMA_ACCESS_FLAG_BIND_MW = BIT_ULL(5), 795 796 }; 796 797 797 798 /* GDMA_CREATE_DMA_REGION */ ··· 885 884 GDMA_MR_TYPE_ZBVA = 4, 886 885 /* Device address MRs */ 887 886 GDMA_MR_TYPE_DM = 5, 887 + /* Memory Window type 1 */ 888 + GDMA_MR_TYPE_MW1 = 6, 889 + /* Memory Window type 2 */ 890 + GDMA_MR_TYPE_MW2 = 7, 888 891 }; 889 892 890 893 struct gdma_create_mr_params {

+1

include/net/mana/mana.h

··· 573 573 netdev_tx_t mana_start_xmit(struct sk_buff *skb, struct net_device *ndev); 574 574 int mana_config_rss(struct mana_port_context *ac, enum TRI_STATE rx, 575 575 bool update_hash, bool update_tab); 576 + int mana_disable_vport_rx(struct mana_port_context *apc); 576 577 577 578 int mana_alloc_queues(struct net_device *ndev); 578 579 int mana_attach(struct net_device *ndev);

+39

include/rdma/frmr_pools.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + * 3 + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 4 + */ 5 + 6 + #ifndef FRMR_POOLS_H 7 + #define FRMR_POOLS_H 8 + 9 + #include <linux/types.h> 10 + #include <asm/page.h> 11 + 12 + struct ib_device; 13 + struct ib_mr; 14 + 15 + struct ib_frmr_key { 16 + u64 vendor_key; 17 + /* A pool with non-zero kernel_vendor_key is a kernel-only pool. */ 18 + u64 kernel_vendor_key; 19 + size_t num_dma_blocks; 20 + int access_flags; 21 + u8 ats:1; 22 + }; 23 + 24 + struct ib_frmr_pool_ops { 25 + int (*create_frmrs)(struct ib_device *device, struct ib_frmr_key *key, 26 + u32 *handles, u32 count); 27 + void (*destroy_frmrs)(struct ib_device *device, u32 *handles, 28 + u32 count); 29 + int (*build_key)(struct ib_device *device, const struct ib_frmr_key *in, 30 + struct ib_frmr_key *out); 31 + }; 32 + 33 + int ib_frmr_pools_init(struct ib_device *device, 34 + const struct ib_frmr_pool_ops *pool_ops); 35 + void ib_frmr_pools_cleanup(struct ib_device *device); 36 + int ib_frmr_pool_pop(struct ib_device *device, struct ib_mr *mr); 37 + int ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr); 38 + 39 + #endif /* FRMR_POOLS_H */

+2 -2

include/rdma/ib_cache.h

··· 34 34 35 35 /** 36 36 * ib_get_cached_pkey - Returns a cached PKey table entry 37 - * @device: The device to query. 37 + * @device_handle: The device to query. 38 38 * @port_num: The port number of the device to query. 39 39 * @index: The index into the cached PKey table to query. 40 40 * @pkey: The PKey value found at the specified index. ··· 80 80 * ib_get_cached_port_state - Returns a cached port state table entry 81 81 * @device: The device to query. 82 82 * @port_num: The port number of the device to query. 83 - * @port_state: port_state for the specified port for that device. 83 + * @port_active: port_state for the specified port for that device. 84 84 * 85 85 * ib_get_cached_port_state() fetches the specified port_state table entry stored in 86 86 * the local software cache.

+29 -37

include/rdma/ib_umem.h

··· 7 7 #ifndef IB_UMEM_H 8 8 #define IB_UMEM_H 9 9 10 - #include <linux/list.h> 11 10 #include <linux/scatterlist.h> 12 - #include <linux/workqueue.h> 13 - #include <rdma/ib_verbs.h> 14 11 15 - struct ib_ucontext; 16 - struct ib_umem_odp; 12 + struct ib_device; 17 13 struct dma_buf_attach_ops; 18 14 19 15 struct ib_umem { ··· 18 22 u64 iova; 19 23 size_t length; 20 24 unsigned long address; 25 + unsigned long dma_attrs; 21 26 u32 writable : 1; 22 27 u32 is_odp : 1; 23 28 u32 is_dmabuf : 1; ··· 33 36 struct scatterlist *last_sg; 34 37 unsigned long first_sg_offset; 35 38 unsigned long last_sg_trim; 39 + void (*pinned_revoke)(void *priv); 36 40 void *private; 37 41 u8 pinned : 1; 38 42 u8 revoked : 1; ··· 73 75 { 74 76 return ib_umem_num_dma_blocks(umem, PAGE_SIZE); 75 77 } 76 - 77 - static inline void __rdma_umem_block_iter_start(struct ib_block_iter *biter, 78 - struct ib_umem *umem, 79 - unsigned long pgsz) 80 - { 81 - __rdma_block_iter_start(biter, umem->sgt_append.sgt.sgl, 82 - umem->sgt_append.sgt.nents, pgsz); 83 - biter->__sg_advance = ib_umem_offset(umem) & ~(pgsz - 1); 84 - biter->__sg_numblocks = ib_umem_num_dma_blocks(umem, pgsz); 85 - } 86 - 87 - static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter) 88 - { 89 - return __rdma_block_iter_next(biter) && biter->__sg_numblocks--; 90 - } 91 - 92 - /** 93 - * rdma_umem_for_each_dma_block - iterate over contiguous DMA blocks of the umem 94 - * @umem: umem to iterate over 95 - * @pgsz: Page size to split the list into 96 - * 97 - * pgsz must be <= PAGE_SIZE or computed by ib_umem_find_best_pgsz(). The 98 - * returned DMA blocks will be aligned to pgsz and span the range: 99 - * ALIGN_DOWN(umem->address, pgsz) to ALIGN(umem->address + umem->length, pgsz) 100 - * 101 - * Performs exactly ib_umem_num_dma_blocks() iterations. 102 - */ 103 - #define rdma_umem_for_each_dma_block(umem, biter, pgsz) \ 104 - for (__rdma_umem_block_iter_start(biter, umem, pgsz); \ 105 - __rdma_umem_block_iter_next(biter);) 106 - 107 78 #ifdef CONFIG_INFINIBAND_USER_MEM 108 79 109 80 struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, ··· 88 121 * ib_umem_find_best_pgoff - Find best HW page size 89 122 * 90 123 * @umem: umem struct 91 - * @pgsz_bitmap bitmap of HW supported page sizes 124 + * @pgsz_bitmap: bitmap of HW supported page sizes 92 125 * @pgoff_bitmask: Mask of bits that can be represented with an offset 93 126 * 94 127 * This is very similar to ib_umem_find_best_pgsz() except instead of accepting ··· 101 134 * 102 135 * If the pgoff_bitmask requires either alignment in the low bit or an 103 136 * unavailable page size for the high bits, this function returns 0. 137 + * 138 + * Returns: best HW page size for the parameters or 0 if none available 139 + * for the given parameters. 104 140 */ 105 141 static inline unsigned long ib_umem_find_best_pgoff(struct ib_umem *umem, 106 142 unsigned long pgsz_bitmap, ··· 139 169 size_t size, int fd, 140 170 int access); 141 171 struct ib_umem_dmabuf * 172 + ib_umem_dmabuf_get_pinned_revocable_and_lock(struct ib_device *device, 173 + unsigned long offset, size_t size, 174 + int fd, int access); 175 + void ib_umem_dmabuf_set_revoke_locked(struct ib_umem_dmabuf *umem_dmabuf, 176 + void (*revoke)(void *priv), void *priv); 177 + struct ib_umem_dmabuf * 142 178 ib_umem_dmabuf_get_pinned_with_dma_device(struct ib_device *device, 143 179 struct device *dma_device, 144 180 unsigned long offset, size_t size, ··· 152 176 int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf); 153 177 void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf); 154 178 void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf); 179 + void ib_umem_dmabuf_revoke_lock(struct ib_umem_dmabuf *umem_dmabuf); 180 + void ib_umem_dmabuf_revoke_unlock(struct ib_umem_dmabuf *umem_dmabuf); 155 181 void ib_umem_dmabuf_revoke(struct ib_umem_dmabuf *umem_dmabuf); 156 182 157 183 #else /* CONFIG_INFINIBAND_USER_MEM */ ··· 200 222 } 201 223 202 224 static inline struct ib_umem_dmabuf * 225 + ib_umem_dmabuf_get_pinned_revocable_and_lock(struct ib_device *device, 226 + unsigned long offset, size_t size, 227 + int fd, int access) 228 + { 229 + return ERR_PTR(-EOPNOTSUPP); 230 + } 231 + 232 + static inline void 233 + ib_umem_dmabuf_set_revoke_locked(struct ib_umem_dmabuf *umem_dmabuf, 234 + void (*revoke)(void *priv), void *priv) {} 235 + 236 + static inline struct ib_umem_dmabuf * 203 237 ib_umem_dmabuf_get_pinned_with_dma_device(struct ib_device *device, 204 238 struct device *dma_device, 205 239 unsigned long offset, size_t size, ··· 226 236 } 227 237 static inline void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf) { } 228 238 static inline void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf) { } 239 + static inline void ib_umem_dmabuf_revoke_lock(struct ib_umem_dmabuf *umem_dmabuf) {} 240 + static inline void ib_umem_dmabuf_revoke_unlock(struct ib_umem_dmabuf *umem_dmabuf) {} 229 241 static inline void ib_umem_dmabuf_revoke(struct ib_umem_dmabuf *umem_dmabuf) {} 230 242 231 243 #endif /* CONFIG_INFINIBAND_USER_MEM */

+105 -89

include/rdma/ib_verbs.h

··· 44 44 #include <uapi/rdma/rdma_user_ioctl.h> 45 45 #include <uapi/rdma/ib_user_ioctl_verbs.h> 46 46 #include <linux/pci-tph.h> 47 + #include <rdma/frmr_pools.h> 47 48 #include <linux/dma-buf.h> 48 49 49 50 #define IB_FW_VERSION_NAME_MAX ETHTOOL_FWVERS_LEN ··· 1577 1576 const struct uverbs_api_object *uapi_object; 1578 1577 }; 1579 1578 1579 + /** 1580 + * struct ib_udata - Driver request/response data from userspace 1581 + * @inbuf: Pointer to request data from userspace 1582 + * @outbuf: Pointer to response buffer in userspace 1583 + * @inlen: Length of request data 1584 + * @outlen: Length of response buffer 1585 + * 1586 + * struct ib_udata is used to hold the driver data request and response 1587 + * structures defined in the uapi. They follow these rules for forwards and 1588 + * backwards compatibility: 1589 + * 1590 + * 1) Userspace can provide a longer request so long as the trailing part the 1591 + * kernel doesn't understand is all zeros. 1592 + * 1593 + * This provides a degree of safety if userspace wrongly tries to use a new 1594 + * feature the kernel does not understand with some non-zero value. 1595 + * 1596 + * It allows a simpler rdma-core implementation because the library can 1597 + * simply always use the latest structs for the request, even if they are 1598 + * bigger. It simply has to avoid using the new members if they are not 1599 + * supported/required. 1600 + * 1601 + * 2) Userspace can provide a shorter request; the kernel will zero-pad it out 1602 + * to fill the storage. The newer kernel should understand that older 1603 + * userspace will provide 0 to new fields. The kernel has three options to 1604 + * enable new request fields: 1605 + * 1606 + * - Input comp_mask that says the field is supported 1607 + * - Look for non-zero values 1608 + * - Check if the udata->inlen size covers the field 1609 + * 1610 + * This also corrects any bugs related to not filling in request structures 1611 + * as the new helper always fully writes to the struct. 1612 + * 1613 + * 3) Userspace can provide a shorter or longer response struct. If shorter, 1614 + * the kernel reply is truncated. The kernel should be designed to not write 1615 + * to new reply fields unless userspace has affirmatively requested them. 1616 + * 1617 + * If the user buffer is longer, the kernel will zero-fill it. 1618 + * 1619 + * Userspace has three options to enable new response fields: 1620 + * 1621 + * - Output comp_mask that says the field is supported 1622 + * - Look for non-zero values 1623 + * - Infer the output must be valid because the request contents demand it 1624 + * and old kernels will fail the request 1625 + * 1626 + * The following helper functions implement these semantics: 1627 + * 1628 + * ib_copy_validate_udata_in() - Checks the minimum length, and zero trailing:: 1629 + * 1630 + * struct driver_create_cq_req req; 1631 + * int err; 1632 + * 1633 + * err = ib_copy_validate_udata_in(udata, req, end_member); 1634 + * if (err) 1635 + * return err; 1636 + * 1637 + * The third argument specifies the last member of the struct in the first 1638 + * kernel version that introduced it, establishing the minimum required size. 1639 + * 1640 + * ib_copy_validate_udata_in_cm() - The above but also validate a 1641 + * comp_mask member only has supported bits set:: 1642 + * 1643 + * err = ib_copy_validate_udata_in_cm(udata, req, first_version_last_member, 1644 + * DRIVER_CREATE_CQ_MASK_FEATURE_A | 1645 + * DRIVER_CREATE_CQ_MASK_FEATURE_B); 1646 + * 1647 + * ib_respond_udata() - Implements the response rules:: 1648 + * 1649 + * struct driver_create_cq_resp resp = {}; 1650 + * 1651 + * resp.some_field = value; 1652 + * return ib_respond_udata(udata, resp); 1653 + * 1654 + * ib_is_udata_in_empty() - Used instead of ib_copy_validate_udata_in() if the 1655 + * driver does not have a request structure:: 1656 + * 1657 + * ret = ib_is_udata_in_empty(udata); 1658 + * if (ret) 1659 + * return ret; 1660 + * 1661 + * Similarly ib_respond_empty_udata() is used instead of ib_respond_udata() if 1662 + * the driver does not have a response structure:: 1663 + * 1664 + * return ib_respond_empty_udata(udata); 1665 + */ 1580 1666 struct ib_udata { 1581 1667 const void __user *inbuf; 1582 1668 void __user *outbuf; ··· 1738 1650 u8 interrupt:1; 1739 1651 u8 shared:1; 1740 1652 unsigned int comp_vector; 1653 + struct ib_umem *umem; 1741 1654 1742 1655 /* 1743 1656 * Implementation details of the RDMA core, don't use in drivers: ··· 1993 1904 struct ib_dm *dm; 1994 1905 struct ib_sig_attrs *sig_attrs; /* only for IB_MR_TYPE_INTEGRITY MRs */ 1995 1906 struct ib_dmah *dmah; 1907 + struct { 1908 + struct ib_frmr_pool *pool; 1909 + struct ib_frmr_key key; 1910 + u32 handle; 1911 + } frmr; 1996 1912 /* 1997 1913 * Implementation details of the RDMA core, don't use in drivers: 1998 1914 */ ··· 2361 2267 2362 2268 /* rdma netdev type - specifies protocol type */ 2363 2269 enum rdma_netdev_t { 2364 - RDMA_NETDEV_OPA_VNIC, 2365 2270 RDMA_NETDEV_IPOIB, 2366 2271 }; 2367 2272 ··· 2374 2281 u32 port_num; 2375 2282 int mtu; 2376 2283 2377 - /* 2378 - * cleanup function must be specified. 2379 - * FIXME: This is only used for OPA_VNIC and that usage should be 2380 - * removed too. 2381 - */ 2382 2284 void (*free_rdma_netdev)(struct net_device *netdev); 2383 2285 2384 2286 /* control functions */ ··· 2475 2387 enum rdma_driver_id driver_id; 2476 2388 u32 uverbs_abi_ver; 2477 2389 unsigned int uverbs_no_driver_id_binding:1; 2390 + /* 2391 + * Indicates the driver checks every op accepting a udata for the 2392 + * correct size on input and always handles the output using the udata 2393 + * helpers. 2394 + */ 2395 + unsigned int uverbs_robust_udata:1; 2478 2396 2479 2397 /* 2480 2398 * NOTE: New drivers should not make use of device_group; instead new ··· 2513 2419 int (*modify_device)(struct ib_device *device, int device_modify_mask, 2514 2420 struct ib_device_modify *device_modify); 2515 2421 void (*get_dev_fw_str)(struct ib_device *device, char *str); 2516 - const struct cpumask *(*get_vector_affinity)(struct ib_device *ibdev, 2517 - int comp_vector); 2518 2422 int (*query_port)(struct ib_device *device, u32 port_num, 2519 2423 struct ib_port_attr *port_attr); 2520 2424 int (*query_port_speed)(struct ib_device *device, u32 port_num, ··· 2629 2537 int (*destroy_qp)(struct ib_qp *qp, struct ib_udata *udata); 2630 2538 int (*create_cq)(struct ib_cq *cq, const struct ib_cq_init_attr *attr, 2631 2539 struct uverbs_attr_bundle *attrs); 2632 - int (*create_cq_umem)(struct ib_cq *cq, 2540 + int (*create_user_cq)(struct ib_cq *cq, 2633 2541 const struct ib_cq_init_attr *attr, 2634 - struct ib_umem *umem, 2635 2542 struct uverbs_attr_bundle *attrs); 2636 2543 int (*modify_cq)(struct ib_cq *cq, u16 cq_count, u16 cq_period); 2637 2544 int (*destroy_cq)(struct ib_cq *cq, struct ib_udata *udata); 2638 - int (*resize_cq)(struct ib_cq *cq, int cqe, struct ib_udata *udata); 2545 + int (*resize_user_cq)(struct ib_cq *cq, unsigned int cqe, 2546 + struct ib_udata *udata); 2639 2547 /* 2640 2548 * pre_destroy_cq - Prevent a cq from generating any new work 2641 2549 * completions, but not free any kernel resources ··· 2999 2907 struct list_head subdev_list; 3000 2908 3001 2909 enum rdma_nl_name_assign_type name_assign_type; 2910 + 2911 + struct ib_frmr_pools *frmr_pools; 3002 2912 }; 3003 2913 3004 2914 static inline void *rdma_zalloc_obj(struct ib_device *dev, size_t size, ··· 3053 2959 u8 no_kverbs_req:1; 3054 2960 }; 3055 2961 3056 - /* 3057 - * IB block DMA iterator 3058 - * 3059 - * Iterates the DMA-mapped SGL in contiguous memory blocks aligned 3060 - * to a HW supported page size. 3061 - */ 3062 - struct ib_block_iter { 3063 - /* internal states */ 3064 - struct scatterlist *__sg; /* sg holding the current aligned block */ 3065 - dma_addr_t __dma_addr; /* unaligned DMA address of this block */ 3066 - size_t __sg_numblocks; /* ib_umem_num_dma_blocks() */ 3067 - unsigned int __sg_nents; /* number of SG entries */ 3068 - unsigned int __sg_advance; /* number of bytes to advance in sg in next step */ 3069 - unsigned int __pg_bit; /* alignment of current block */ 3070 - }; 3071 - 3072 2962 struct ib_device *_ib_alloc_device(size_t size, struct net *net); 3073 2963 #define ib_alloc_device(drv_struct, member) \ 3074 2964 container_of(_ib_alloc_device(sizeof(struct drv_struct) + \ ··· 3080 3002 3081 3003 int ib_register_client (struct ib_client *client); 3082 3004 void ib_unregister_client(struct ib_client *client); 3083 - 3084 - void __rdma_block_iter_start(struct ib_block_iter *biter, 3085 - struct scatterlist *sglist, 3086 - unsigned int nents, 3087 - unsigned long pgsz); 3088 - bool __rdma_block_iter_next(struct ib_block_iter *biter); 3089 - 3090 - /** 3091 - * rdma_block_iter_dma_address - get the aligned dma address of the current 3092 - * block held by the block iterator. 3093 - * @biter: block iterator holding the memory block 3094 - */ 3095 - static inline dma_addr_t 3096 - rdma_block_iter_dma_address(struct ib_block_iter *biter) 3097 - { 3098 - return biter->__dma_addr & ~(BIT_ULL(biter->__pg_bit) - 1); 3099 - } 3100 - 3101 - /** 3102 - * rdma_for_each_block - iterate over contiguous memory blocks of the sg list 3103 - * @sglist: sglist to iterate over 3104 - * @biter: block iterator holding the memory block 3105 - * @nents: maximum number of sg entries to iterate over 3106 - * @pgsz: best HW supported page size to use 3107 - * 3108 - * Callers may use rdma_block_iter_dma_address() to get each 3109 - * blocks aligned DMA address. 3110 - */ 3111 - #define rdma_for_each_block(sglist, biter, nents, pgsz) \ 3112 - for (__rdma_block_iter_start(biter, sglist, nents, \ 3113 - pgsz); \ 3114 - __rdma_block_iter_next(biter);) 3115 3005 3116 3006 /** 3117 3007 * ib_get_client_data - Get IB client context ··· 4105 4059 __ib_create_cq((device), (cmp_hndlr), (evt_hndlr), (cq_ctxt), (cq_attr), KBUILD_MODNAME) 4106 4060 4107 4061 /** 4108 - * ib_resize_cq - Modifies the capacity of the CQ. 4109 - * @cq: The CQ to resize. 4110 - * @cqe: The minimum size of the CQ. 4111 - * 4112 - * Users can examine the cq structure to determine the actual CQ size. 4113 - */ 4114 - int ib_resize_cq(struct ib_cq *cq, int cqe); 4115 - 4116 - /** 4117 4062 * rdma_set_cq_moderation - Modifies moderation params of the CQ 4118 4063 * @cq: The CQ to modify. 4119 4064 * @cq_count: number of CQEs that will trigger an event ··· 4909 4872 { 4910 4873 WARN_ON_ONCE(lid & 0xFFFF0000); 4911 4874 return cpu_to_be16((u16)lid); 4912 - } 4913 - 4914 - /** 4915 - * ib_get_vector_affinity - Get the affinity mappings of a given completion 4916 - * vector 4917 - * @device: the rdma device 4918 - * @comp_vector: index of completion vector 4919 - * 4920 - * Returns NULL on failure, otherwise a corresponding cpu map of the 4921 - * completion vector (returns all-cpus map if the device driver doesn't 4922 - * implement get_vector_affinity). 4923 - */ 4924 - static inline const struct cpumask * 4925 - ib_get_vector_affinity(struct ib_device *device, int comp_vector) 4926 - { 4927 - if (comp_vector < 0 || comp_vector >= device->num_comp_vectors || 4928 - !device->ops.get_vector_affinity) 4929 - return NULL; 4930 - 4931 - return device->ops.get_vector_affinity(device, comp_vector); 4932 - 4933 4875 } 4934 4876 4935 4877 /**

+88

include/rdma/iter.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ 2 + /* Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. */ 3 + 4 + #ifndef _RDMA_ITER_H_ 5 + #define _RDMA_ITER_H_ 6 + 7 + #include <linux/scatterlist.h> 8 + #include <rdma/ib_umem.h> 9 + 10 + /** 11 + * IB block DMA iterator 12 + * 13 + * Iterates the DMA-mapped SGL in contiguous memory blocks aligned 14 + * to a HW supported page size. 15 + */ 16 + struct ib_block_iter { 17 + /* internal states */ 18 + struct scatterlist *__sg; /* sg holding the current aligned block */ 19 + dma_addr_t __dma_addr; /* unaligned DMA address of this block */ 20 + size_t __sg_numblocks; /* ib_umem_num_dma_blocks() */ 21 + unsigned int __sg_nents; /* number of SG entries */ 22 + unsigned int __sg_advance; /* number of bytes to advance in sg in next step */ 23 + unsigned int __pg_bit; /* alignment of current block */ 24 + }; 25 + 26 + void __rdma_block_iter_start(struct ib_block_iter *biter, 27 + struct scatterlist *sglist, 28 + unsigned int nents, 29 + unsigned long pgsz); 30 + bool __rdma_block_iter_next(struct ib_block_iter *biter); 31 + 32 + /** 33 + * rdma_block_iter_dma_address - get the aligned dma address of the current 34 + * block held by the block iterator. 35 + * @biter: block iterator holding the memory block 36 + */ 37 + static inline dma_addr_t 38 + rdma_block_iter_dma_address(struct ib_block_iter *biter) 39 + { 40 + return biter->__dma_addr & ~(BIT_ULL(biter->__pg_bit) - 1); 41 + } 42 + 43 + /** 44 + * rdma_for_each_block - iterate over contiguous memory blocks of the sg list 45 + * @sglist: sglist to iterate over 46 + * @biter: block iterator holding the memory block 47 + * @nents: maximum number of sg entries to iterate over 48 + * @pgsz: best HW supported page size to use 49 + * 50 + * Callers may use rdma_block_iter_dma_address() to get each 51 + * blocks aligned DMA address. 52 + */ 53 + #define rdma_for_each_block(sglist, biter, nents, pgsz) \ 54 + for (__rdma_block_iter_start(biter, sglist, nents, \ 55 + pgsz); \ 56 + __rdma_block_iter_next(biter);) 57 + 58 + static inline void __rdma_umem_block_iter_start(struct ib_block_iter *biter, 59 + struct ib_umem *umem, 60 + unsigned long pgsz) 61 + { 62 + __rdma_block_iter_start(biter, umem->sgt_append.sgt.sgl, 63 + umem->sgt_append.sgt.nents, pgsz); 64 + biter->__sg_advance = ib_umem_offset(umem) & ~(pgsz - 1); 65 + biter->__sg_numblocks = ib_umem_num_dma_blocks(umem, pgsz); 66 + } 67 + 68 + static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter) 69 + { 70 + return __rdma_block_iter_next(biter) && biter->__sg_numblocks--; 71 + } 72 + 73 + /** 74 + * rdma_umem_for_each_dma_block - iterate over contiguous DMA blocks of the umem 75 + * @umem: umem to iterate over 76 + * @pgsz: Page size to split the list into 77 + * 78 + * pgsz must be <= PAGE_SIZE or computed by ib_umem_find_best_pgsz(). The 79 + * returned DMA blocks will be aligned to pgsz and span the range: 80 + * ALIGN_DOWN(umem->address, pgsz) to ALIGN(umem->address + umem->length, pgsz) 81 + * 82 + * Performs exactly ib_umem_num_dma_blocks() iterations. 83 + */ 84 + #define rdma_umem_for_each_dma_block(umem, biter, pgsz) \ 85 + for (__rdma_umem_block_iter_start(biter, umem, pgsz); \ 86 + __rdma_umem_block_iter_next(biter);) 87 + 88 + #endif /* _RDMA_ITER_H_ */

+7 -7

include/rdma/iw_cm.h

··· 33 33 }; 34 34 35 35 /** 36 - * iw_cm_handler - Function to be called by the IW CM when delivering events 37 - * to the client. 36 + * typedef iw_cm_handler - Function to be called by the IW CM when delivering 37 + * events to the client. 38 38 * 39 39 * @cm_id: The IW CM identifier associated with the event. 40 40 * @event: Pointer to the event structure. ··· 43 43 struct iw_cm_event *event); 44 44 45 45 /** 46 - * iw_event_handler - Function called by the provider when delivering provider 47 - * events to the IW CM. Returns either 0 indicating the event was processed 48 - * or -errno if the event could not be processed. 46 + * typedef iw_event_handler - Function called by the provider when delivering 47 + * provider events to the IW CM. Returns either 0 indicating the event was 48 + * processed or -errno if the event could not be processed. 49 49 * 50 50 * @cm_id: The IW CM identifier associated with the event. 51 51 * @event: Pointer to the event structure. ··· 97 97 * iw_create_cm_id - Create an IW CM identifier. 98 98 * 99 99 * @device: The IB device on which to create the IW CM identier. 100 - * @event_handler: User callback invoked to report events associated with the 100 + * @cm_handler: User callback invoked to report events associated with the 101 101 * returned IW CM identifier. 102 102 * @context: User specified context associated with the id. 103 103 */ ··· 147 147 * iw_cm_reject - Reject an incoming connection request. 148 148 * 149 149 * @cm_id: Connection identifier associated with the request. 150 - * @private_daa: Pointer to data to deliver to the remote peer as part of the 150 + * @private_data: Pointer to data to deliver to the remote peer as part of the 151 151 * reject message. 152 152 * @private_data_len: The number of bytes in the private_data parameter. 153 153 *

+5 -3

include/rdma/opa_port_info.h

··· 93 93 #define OPA_LINKINIT_QUARANTINED (9 << 4) 94 94 #define OPA_LINKINIT_INSUFIC_CAPABILITY (10 << 4) 95 95 96 - #define OPA_LINK_SPEED_NOP 0x0000 /* Reserved (1-5 Gbps) */ 97 - #define OPA_LINK_SPEED_12_5G 0x0001 /* 12.5 Gbps */ 98 - #define OPA_LINK_SPEED_25G 0x0002 /* 25.78125? Gbps (EDR) */ 96 + #define OPA_LINK_SPEED_NOP 0x0000 /* no change */ 97 + #define OPA_LINK_SPEED_12_5G 0x0001 /* 12.5 Gbps */ 98 + #define OPA_LINK_SPEED_25G 0x0002 /* 25.78125 Gbps */ 99 + #define OPA_LINK_SPEED_50G 0x0004 /* 53.125 Gbps */ 100 + #define OPA_LINK_SPEED_100G 0x0008 /* 106.25 Gbps */ 99 101 100 102 #define OPA_LINK_WIDTH_1X 0x0001 101 103 #define OPA_LINK_WIDTH_2X 0x0002

-96

include/rdma/opa_vnic.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ 2 - /* 3 - * Copyright(c) 2017 - 2020 Intel Corporation. 4 - */ 5 - 6 - #ifndef _OPA_VNIC_H 7 - #define _OPA_VNIC_H 8 - 9 - /* 10 - * This file contains Intel Omni-Path (OPA) Virtual Network Interface 11 - * Controller (VNIC) specific declarations. 12 - */ 13 - 14 - #include <rdma/ib_verbs.h> 15 - 16 - /* 16 header bytes + 2 reserved bytes */ 17 - #define OPA_VNIC_L2_HDR_LEN (16 + 2) 18 - 19 - #define OPA_VNIC_L4_HDR_LEN 2 20 - 21 - #define OPA_VNIC_HDR_LEN (OPA_VNIC_L2_HDR_LEN + \ 22 - OPA_VNIC_L4_HDR_LEN) 23 - 24 - #define OPA_VNIC_L4_ETHR 0x78 25 - 26 - #define OPA_VNIC_ICRC_LEN 4 27 - #define OPA_VNIC_TAIL_LEN 1 28 - #define OPA_VNIC_ICRC_TAIL_LEN (OPA_VNIC_ICRC_LEN + OPA_VNIC_TAIL_LEN) 29 - 30 - #define OPA_VNIC_SKB_MDATA_LEN 4 31 - #define OPA_VNIC_SKB_MDATA_ENCAP_ERR 0x1 32 - 33 - /* opa vnic rdma netdev's private data structure */ 34 - struct opa_vnic_rdma_netdev { 35 - struct rdma_netdev rn; /* keep this first */ 36 - /* followed by device private data */ 37 - char *dev_priv[]; 38 - }; 39 - 40 - static inline void *opa_vnic_priv(const struct net_device *dev) 41 - { 42 - struct rdma_netdev *rn = netdev_priv(dev); 43 - 44 - return rn->clnt_priv; 45 - } 46 - 47 - static inline void *opa_vnic_dev_priv(const struct net_device *dev) 48 - { 49 - struct opa_vnic_rdma_netdev *oparn = netdev_priv(dev); 50 - 51 - return oparn->dev_priv; 52 - } 53 - 54 - /* opa_vnic skb meta data structure */ 55 - struct opa_vnic_skb_mdata { 56 - u8 vl; 57 - u8 entropy; 58 - u8 flags; 59 - u8 rsvd; 60 - } __packed; 61 - 62 - /* OPA VNIC group statistics */ 63 - struct opa_vnic_grp_stats { 64 - u64 unicast; 65 - u64 mcastbcast; 66 - u64 untagged; 67 - u64 vlan; 68 - u64 s_64; 69 - u64 s_65_127; 70 - u64 s_128_255; 71 - u64 s_256_511; 72 - u64 s_512_1023; 73 - u64 s_1024_1518; 74 - u64 s_1519_max; 75 - }; 76 - 77 - struct opa_vnic_stats { 78 - /* standard netdev statistics */ 79 - struct rtnl_link_stats64 netstats; 80 - 81 - /* OPA VNIC statistics */ 82 - struct opa_vnic_grp_stats tx_grp; 83 - struct opa_vnic_grp_stats rx_grp; 84 - u64 tx_dlid_zero; 85 - u64 tx_drop_state; 86 - u64 rx_drop_state; 87 - u64 rx_runt; 88 - u64 rx_oversize; 89 - }; 90 - 91 - static inline bool rdma_cap_opa_vnic(struct ib_device *device) 92 - { 93 - return !!(device->attrs.kernel_cap_flags & IBK_RDMA_NETDEV_OPA); 94 - } 95 - 96 - #endif /* _OPA_VNIC_H */

+2

include/rdma/rdma_netlink.h

··· 5 5 6 6 #include <linux/netlink.h> 7 7 #include <uapi/rdma/rdma_netlink.h> 8 + #include <rdma/ib_verbs.h> 8 9 9 10 struct ib_device; 10 11 ··· 127 126 struct list_head list; 128 127 const char *type; 129 128 int (*newlink)(const char *ibdev_name, struct net_device *ndev); 129 + int (*dellink)(struct ib_device *dev); 130 130 }; 131 131 132 132 void rdma_link_register(struct rdma_link_ops *ops);

+10

include/rdma/rdma_vt.h

··· 149 149 /* User context */ 150 150 struct rvt_ucontext { 151 151 struct ib_ucontext ibucontext; 152 + void *priv; 152 153 }; 153 154 154 155 /* Protection domain */ ··· 360 359 361 360 /* Get and return CPU to pin CQ processing thread */ 362 361 int (*comp_vect_cpu_lookup)(struct rvt_dev_info *rdi, int comp_vect); 362 + 363 + /* allocate a ucontext */ 364 + int (*alloc_ucontext)(struct ib_ucontext *uctx, struct ib_udata *udata); 365 + 366 + /* deallocate a ucontext */ 367 + void (*dealloc_ucontext)(struct ib_ucontext *context); 368 + 369 + /* driver mmap */ 370 + int (*mmap)(struct ib_ucontext *context, struct vm_area_struct *vma); 363 371 }; 364 372 365 373 struct rvt_dev_info {

+2 -2

include/rdma/restrack.h

··· 87 87 * query stage. 88 88 */ 89 89 u8 no_track : 1; 90 - /* 90 + /** 91 91 * @kref: Protect destroy of the resource 92 92 */ 93 93 struct kref kref; 94 - /* 94 + /** 95 95 * @comp: Signal that all consumers of resource are completed their work 96 96 */ 97 97 struct completion comp;

+101

include/rdma/uverbs_ioctl.h

··· 667 667 (udata ? container_of(rdma_udata_to_uverbs_attr_bundle(udata)->context, \ 668 668 drv_dev_struct, member) : (drv_dev_struct *)NULL) 669 669 670 + struct ib_device *rdma_udata_to_dev(struct ib_udata *udata); 671 + 670 672 #define IS_UVERBS_COPY_ERR(_ret) ((_ret) && (_ret) != -ENOENT) 671 673 672 674 static inline const struct uverbs_attr *uverbs_attr_get(const struct uverbs_attr_bundle *attrs_bundle, ··· 897 895 size_t idx, u64 upper_bound, u64 *def_val); 898 896 int uverbs_copy_to_struct_or_zero(const struct uverbs_attr_bundle *bundle, 899 897 size_t idx, const void *from, size_t size); 898 + 899 + int _ib_copy_validate_udata_in(struct ib_udata *udata, void *req, 900 + size_t kernel_size, size_t minimum_size); 901 + int _ib_respond_udata(struct ib_udata *udata, const void *src, size_t len); 900 902 #else 901 903 static inline int 902 904 uverbs_get_flags64(u64 *to, const struct uverbs_attr_bundle *attrs_bundle, ··· 954 948 _uverbs_get_const_unsigned(u64 *to, 955 949 const struct uverbs_attr_bundle *attrs_bundle, 956 950 size_t idx, u64 upper_bound, u64 *def_val) 951 + { 952 + return -EINVAL; 953 + } 954 + 955 + static inline int _ib_copy_validate_udata_in(struct ib_udata *udata, void *req, 956 + size_t kernel_size, 957 + size_t minimum_size) 958 + { 959 + return -EINVAL; 960 + } 961 + 962 + static inline int _ib_respond_udata(struct ib_udata *udata, const void *src, 963 + size_t len) 957 964 { 958 965 return -EINVAL; 959 966 } ··· 1033 1014 size_t idx) 1034 1015 { 1035 1016 return uverbs_get_const_signed(to, attrs_bundle, idx); 1017 + } 1018 + 1019 + /** 1020 + * ib_copy_validate_udata_in - Copy and validate that the request structure is 1021 + * compatible with this kernel 1022 + * @_udata: The system calls ib_udata struct 1023 + * @_req: The name of an on-stack structure that holds the driver data 1024 + * @_end_member: The member in the struct that is the original end of struct 1025 + * from the first kernel to introduce it. 1026 + * 1027 + * Check that the udata input request struct is properly formed for this kernel. 1028 + * Then copy it into req 1029 + */ 1030 + #define ib_copy_validate_udata_in(_udata, _req, _end_member) \ 1031 + _ib_copy_validate_udata_in(_udata, &(_req), sizeof(_req), \ 1032 + offsetofend(typeof(_req), _end_member)) 1033 + 1034 + int _ib_copy_validate_udata_cm_fail(struct ib_udata *udata, u64 req_cm, 1035 + u64 valid_cm); 1036 + 1037 + /** 1038 + * ib_copy_validate_udata_in_cm - Copy the req structure and check the comp_mask 1039 + * @_udata: The system calls ib_udata struct 1040 + * @_req: The name of an on-stack structure that holds the driver data 1041 + * @_end_member: The member in the struct that is the original end of struct 1042 + * from the first kernel to introduce it. 1043 + * @_valid_cm: A bitmask of bits permitted in the comp_mask_field. 1044 + * 1045 + * Check that the udata input request struct is properly formed for this kernel. 1046 + * Then copy it into req 1047 + */ 1048 + #define ib_copy_validate_udata_in_cm(_udata, _req, _end_member, _valid_cm) \ 1049 + ({ \ 1050 + typeof((_req).comp_mask) __valid_cm = _valid_cm; \ 1051 + int ret = \ 1052 + ib_copy_validate_udata_in(_udata, _req, _end_member); \ 1053 + if (!ret && ((_req).comp_mask & ~__valid_cm)) \ 1054 + ret = _ib_copy_validate_udata_cm_fail( \ 1055 + _udata, (_req).comp_mask, __valid_cm); \ 1056 + ret; \ 1057 + }) 1058 + 1059 + /** 1060 + * ib_is_udata_in_empty - Check if the udata input buffer is all zeros 1061 + * @udata: The system calls ib_udata struct 1062 + * 1063 + * This should be used if the driver does not currently define a driver data 1064 + * struct. Returns 0 if the buffer is empty or all zeros, -EOPNOTSUPP if 1065 + * non-zero data is present, or a negative error code on failure. 1066 + */ 1067 + static inline int ib_is_udata_in_empty(struct ib_udata *udata) 1068 + { 1069 + if (!udata || udata->inlen == 0) 1070 + return 0; 1071 + return _ib_copy_validate_udata_in(udata, NULL, 0, 0); 1072 + } 1073 + 1074 + /** 1075 + * ib_respond_udata - Copy a driver data response to userspace 1076 + * @_udata: The system calls ib_udata struct 1077 + * @_rep: Kernel buffer containing the response driver data on the stack 1078 + * 1079 + * Copy driver data response structures back to userspace in a way that 1080 + * is forwards and backwards compatible. Longer kernel structs are truncated, 1081 + * userspace has made some kind of error if it needed the truncated information. 1082 + * Shorter structs are zero padded. 1083 + */ 1084 + #define ib_respond_udata(_udata, _rep) \ 1085 + _ib_respond_udata(_udata, &(_rep), sizeof(_rep)) 1086 + 1087 + /** 1088 + * ib_respond_empty_udata - Zero fill the response buffer to userspace 1089 + * @_udata: The system calls ib_udata struct 1090 + * 1091 + * Used when there is no driver response data to return. Provides forward 1092 + * compatability by zeroing any buffer the user may have provided. 1093 + */ 1094 + static inline int ib_respond_empty_udata(struct ib_udata *udata) 1095 + { 1096 + if (udata && udata->outlen && clear_user(udata->outbuf, udata->outlen)) 1097 + return -EFAULT; 1098 + return 0; 1036 1099 } 1037 1100 1038 1101 #endif

+35 -1

include/uapi/rdma/bnxt_re-abi.h

··· 102 102 struct bnxt_re_cq_req { 103 103 __aligned_u64 cq_va; 104 104 __aligned_u64 cq_handle; 105 + __aligned_u64 comp_mask; 105 106 }; 106 107 107 - enum bnxt_re_cq_mask { 108 + enum bnxt_re_resp_cq_mask { 108 109 BNXT_RE_CQ_TOGGLE_PAGE_SUPPORT = 0x1, 110 + }; 111 + 112 + enum bnxt_re_req_cq_mask { 113 + BNXT_RE_CQ_FIXED_NUM_CQE_ENABLE = 0x1, 109 114 }; 110 115 111 116 struct bnxt_re_cq_resp { ··· 168 163 BNXT_RE_OBJECT_ALLOC_PAGE = (1U << UVERBS_ID_NS_SHIFT), 169 164 BNXT_RE_OBJECT_NOTIFY_DRV, 170 165 BNXT_RE_OBJECT_GET_TOGGLE_MEM, 166 + BNXT_RE_OBJECT_DBR, 167 + BNXT_RE_OBJECT_DEFAULT_DBR, 171 168 }; 172 169 173 170 enum bnxt_re_alloc_page_type { ··· 238 231 struct bnxt_re_query_device_ex_resp { 239 232 struct bnxt_re_packet_pacing_caps packet_pacing_caps; 240 233 }; 234 + 235 + struct bnxt_re_db_region { 236 + __u32 dpi; 237 + __u32 reserved; 238 + __aligned_u64 umdbr; 239 + }; 240 + 241 + enum bnxt_re_obj_dbr_alloc_attrs { 242 + BNXT_RE_ALLOC_DBR_HANDLE = (1U << UVERBS_ID_NS_SHIFT), 243 + BNXT_RE_ALLOC_DBR_ATTR, 244 + BNXT_RE_ALLOC_DBR_OFFSET, 245 + }; 246 + 247 + enum bnxt_re_obj_dbr_free_attrs { 248 + BNXT_RE_FREE_DBR_HANDLE = (1U << UVERBS_ID_NS_SHIFT), 249 + }; 250 + 251 + enum bnxt_re_obj_default_dbr_attrs { 252 + BNXT_RE_DEFAULT_DBR_ATTR = (1U << UVERBS_ID_NS_SHIFT), 253 + }; 254 + 255 + enum bnxt_re_obj_dpi_methods { 256 + BNXT_RE_METHOD_DBR_ALLOC = (1U << UVERBS_ID_NS_SHIFT), 257 + BNXT_RE_METHOD_DBR_FREE, 258 + BNXT_RE_METHOD_GET_DEFAULT_DBR, 259 + }; 260 + 241 261 #endif /* __BNXT_RE_UVERBS_ABI_H__*/

+6 -5

include/uapi/rdma/efa-abi.h

··· 1 1 /* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */ 2 2 /* 3 - * Copyright 2018-2025 Amazon.com, Inc. or its affiliates. All rights reserved. 3 + * Copyright 2018-2026 Amazon.com, Inc. or its affiliates. All rights reserved. 4 4 */ 5 5 6 6 #ifndef EFA_ABI_USER_H ··· 22 22 */ 23 23 24 24 enum { 25 - EFA_ALLOC_UCONTEXT_CMD_COMP_TX_BATCH = 1 << 0, 26 - EFA_ALLOC_UCONTEXT_CMD_COMP_MIN_SQ_WR = 1 << 1, 25 + EFA_ALLOC_UCONTEXT_CMD_SUPP_CAPS_TX_BATCH = 1 << 0, 26 + EFA_ALLOC_UCONTEXT_CMD_SUPP_CAPS_MIN_SQ_WR = 1 << 1, 27 27 }; 28 28 29 29 struct efa_ibv_alloc_ucontext_cmd { 30 - __u32 comp_mask; 30 + __u32 supported_caps; 31 31 __u8 reserved_20[4]; 32 32 }; 33 33 ··· 44 44 __u32 max_llq_size; /* bytes */ 45 45 __u16 max_tx_batch; /* units of 64 bytes */ 46 46 __u16 min_sq_wr; 47 - __u8 reserved_a0[4]; 47 + __u16 inline_buf_size_ex; 48 + __u8 reserved_b0[2]; 48 49 }; 49 50 50 51 struct efa_ibv_alloc_pd_resp {

+1

include/uapi/rdma/ib_user_ioctl_verbs.h

··· 46 46 47 47 enum ib_uverbs_core_support { 48 48 IB_UVERBS_CORE_SUPPORT_OPTIONAL_MR_ACCESS = 1 << 0, 49 + IB_UVERBS_CORE_SUPPORT_ROBUST_UDATA = 1 << 1, 49 50 }; 50 51 51 52 enum ib_uverbs_access_flags {

+1

include/uapi/rdma/mlx5_user_ioctl_cmds.h

··· 139 139 MLX5_IB_ATTR_VAR_OBJ_ALLOC_MMAP_OFFSET, 140 140 MLX5_IB_ATTR_VAR_OBJ_ALLOC_MMAP_LENGTH, 141 141 MLX5_IB_ATTR_VAR_OBJ_ALLOC_PAGE_ID, 142 + MLX5_IB_ATTR_VAR_OBJ_ALLOC_FLAGS, 142 143 }; 143 144 144 145 enum mlx5_ib_var_obj_destroy_attrs {

+4

include/uapi/rdma/mlx5_user_ioctl_verbs.h

··· 100 100 MLX5_IB_UAPI_QUERY_PORT_ESW_OWNER_VHCA_ID = 1 << 5, 101 101 }; 102 102 103 + enum mlx5_ib_uapi_var_alloc_flags { 104 + MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP = 1 << 0, 105 + }; 106 + 103 107 struct mlx5_ib_uapi_reg { 104 108 __u32 value; 105 109 __u32 mask;

+22

include/uapi/rdma/rdma_netlink.h

··· 308 308 309 309 RDMA_NLDEV_CMD_MONITOR, 310 310 311 + RDMA_NLDEV_CMD_FRMR_POOLS_GET, /* can dump */ 312 + 313 + RDMA_NLDEV_CMD_FRMR_POOLS_SET, 314 + 311 315 RDMA_NLDEV_NUM_OPS 312 316 }; 313 317 ··· 586 582 RDMA_NLDEV_SYS_ATTR_MONITOR_MODE, /* u8 */ 587 583 588 584 RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED, /* u8 */ 585 + 586 + /* 587 + * FRMR Pools attributes 588 + */ 589 + RDMA_NLDEV_ATTR_FRMR_POOLS, /* nested table */ 590 + RDMA_NLDEV_ATTR_FRMR_POOL_ENTRY, /* nested table */ 591 + RDMA_NLDEV_ATTR_FRMR_POOL_KEY, /* nested table */ 592 + RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS, /* u8 */ 593 + RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS, /* u32 */ 594 + RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY, /* u64 */ 595 + RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS, /* u64 */ 596 + RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES, /* u32 */ 597 + RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE, /* u64 */ 598 + RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE, /* u64 */ 599 + RDMA_NLDEV_ATTR_FRMR_POOLS_AGING_PERIOD, /* u32 */ 600 + RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES, /* u32 */ 601 + RDMA_NLDEV_ATTR_FRMR_POOL_KEY_KERNEL_VENDOR_KEY, /* u64 */ 602 + 589 603 /* 590 604 * Always the end 591 605 */

+1

tools/testing/selftests/Makefile

··· 98 98 TARGETS += pstore 99 99 TARGETS += ptrace 100 100 TARGETS += openat2 101 + TARGETS += rdma 101 102 TARGETS += resctrl 102 103 TARGETS += riscv 103 104 TARGETS += rlimits

+7

tools/testing/selftests/rdma/Makefile

··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + TEST_PROGS := rxe_rping_between_netns.sh \ 3 + rxe_ipv6.sh \ 4 + rxe_socket_with_netns.sh \ 5 + rxe_test_NETDEV_UNREGISTER.sh 6 + 7 + include ../lib.mk

+3

tools/testing/selftests/rdma/config

··· 1 + CONFIG_TUN 2 + CONFIG_VETH 3 + CONFIG_RDMA_RXE

+63

tools/testing/selftests/rdma/rxe_ipv6.sh

··· 1 + #!/bin/bash 2 + 3 + # Configuration 4 + NS_NAME="net6" 5 + VETH_HOST="veth0" 6 + VETH_NS="veth1" 7 + RXE_NAME="rxe6" 8 + PORT=4791 9 + IP6_ADDR="2001:db8::1/64" 10 + 11 + exec > /dev/null 12 + 13 + # Cleanup function to run on exit (even on failure) 14 + cleanup() { 15 + ip netns del "$NS_NAME" 2>/dev/null 16 + modprobe -r rdma_rxe 2>/dev/null 17 + echo "Done." 18 + } 19 + trap cleanup EXIT 20 + 21 + # 1. Prerequisites check 22 + for mod in tun veth rdma_rxe; do 23 + if ! modinfo "$mod" >/dev/null 2>&1; then 24 + echo "Error: Kernel module '$mod' not found." 25 + exit 1 26 + fi 27 + done 28 + 29 + modprobe rdma_rxe 30 + 31 + # 2. Setup Namespace and Networking 32 + echo "Setting up IPv6 network namespace..." 33 + ip netns add "$NS_NAME" 34 + ip link add "$VETH_HOST" type veth peer name "$VETH_NS" 35 + ip link set "$VETH_NS" netns "$NS_NAME" 36 + ip netns exec "$NS_NAME" ip addr add "$IP6_ADDR" dev "$VETH_NS" 37 + ip netns exec "$NS_NAME" ip link set "$VETH_NS" up 38 + ip link set "$VETH_HOST" up 39 + 40 + # 3. Add RDMA Link 41 + echo "Adding RDMA RXE link..." 42 + if ! ip netns exec "$NS_NAME" rdma link add "$RXE_NAME" type rxe netdev "$VETH_NS"; then 43 + echo "Error: Failed to create RXE link." 44 + exit 1 45 + fi 46 + 47 + # 4. Verification: Port should be listening 48 + # Using -H to skip headers and -q for quiet exit codes 49 + if ! ip netns exec "$NS_NAME" ss -Hul6n sport = :$PORT | grep -q ":$PORT"; then 50 + echo "Error: UDP port $PORT is NOT listening after link creation." 51 + exit 1 52 + fi 53 + echo "Verified: Port $PORT is active." 54 + 55 + # 5. Removal and Verification 56 + echo "Deleting RDMA link..." 57 + ip netns exec "$NS_NAME" rdma link del "$RXE_NAME" 58 + 59 + if ip netns exec "$NS_NAME" ss -Hul6n sport = :$PORT | grep -q ":$PORT"; then 60 + echo "Error: UDP port $PORT still active after link deletion." 61 + exit 1 62 + fi 63 + echo "Verified: Port $PORT closed successfully."

+85

tools/testing/selftests/rdma/rxe_rping_between_netns.sh

··· 1 + #!/bin/bash 2 + 3 + # Configuration 4 + NS="test1" 5 + VETH_A="veth-a" 6 + VETH_B="veth-b" 7 + IP_A="1.1.1.1" 8 + IP_B="1.1.1.2" 9 + PORT=4791 10 + 11 + exec > /dev/null 12 + 13 + # --- Cleanup Routine --- 14 + cleanup() { 15 + echo "Cleaning up resources..." 16 + rdma link del rxe1 2>/dev/null 17 + ip netns exec "$NS" rdma link del rxe0 2>/dev/null 18 + ip link delete "$VETH_B" 2>/dev/null 19 + ip netns del "$NS" 2>/dev/null 20 + modprobe -r rdma_rxe 2>/dev/null 21 + } 22 + trap cleanup EXIT 23 + 24 + # --- Prerequisite Checks --- 25 + if [[ $EUID -ne 0 ]]; then 26 + echo "This script must be run as root" 27 + exit 1 28 + fi 29 + 30 + modprobe rdma_rxe || { echo "Failed to load rdma_rxe"; exit 1; } 31 + 32 + # --- Setup Network Topology --- 33 + echo "Setting up network namespace and veth pair..." 34 + ip netns add "$NS" 35 + ip link add "$VETH_A" type veth peer name "$VETH_B" 36 + ip link set "$VETH_A" netns "$NS" 37 + 38 + # Configure Namespace side (test1) 39 + ip netns exec "$NS" ip addr add "$IP_A/24" dev "$VETH_A" 40 + ip netns exec "$NS" ip link set "$VETH_A" up 41 + ip netns exec "$NS" ip link set lo up 42 + 43 + # Configure Host side 44 + ip addr add "$IP_B/24" dev "$VETH_B" 45 + ip link set "$VETH_B" up 46 + 47 + # --- RXE Link Creation --- 48 + echo "Creating RDMA links..." 49 + ip netns exec "$NS" rdma link add rxe0 type rxe netdev "$VETH_A" 50 + rdma link add rxe1 type rxe netdev "$VETH_B" 51 + 52 + # Verify UDP 4791 is listening 53 + check_port() { 54 + local target=$1 # "host" or "ns" 55 + if [ "$target" == "ns" ]; then 56 + ip netns exec "$NS" ss -Huln sport == :$PORT | grep -q ":$PORT" 57 + else 58 + ss -Huln sport == :$PORT | grep -q ":$PORT" 59 + fi 60 + } 61 + 62 + check_port "ns" || { echo "Error: RXE port not listening in namespace"; exit 1; } 63 + check_port "host" || { echo "Error: RXE port not listening on host"; exit 1; } 64 + 65 + # --- Connectivity Test --- 66 + echo "Testing connectivity with rping..." 67 + ping -c 2 -W 1 "$IP_A" > /dev/null || { echo "Ping failed"; exit 1; } 68 + 69 + # Start rping server in background 70 + ip netns exec "$NS" rping -s -a "$IP_A" -v > /dev/null 2>&1 & 71 + RPING_PID=$! 72 + sleep 1 # Allow server to bind 73 + 74 + # Run rping client 75 + rping -c -a "$IP_A" -d -v -C 3 76 + RESULT=$? 77 + 78 + kill $RPING_PID 2>/dev/null 79 + 80 + if [ $RESULT -eq 0 ]; then 81 + echo "SUCCESS: RDMA traffic verified." 82 + else 83 + echo "FAILURE: rping failed." 84 + exit 1 85 + fi

+76

tools/testing/selftests/rdma/rxe_socket_with_netns.sh

··· 1 + #!/bin/bash 2 + 3 + # Configuration 4 + PORT=4791 5 + MODS=("tun" "rdma_rxe") 6 + 7 + exec > /dev/null 8 + 9 + # --- Helper: Cleanup Routine --- 10 + cleanup() { 11 + echo "Cleaning up resources..." 12 + rdma link del rxe1 2>/dev/null 13 + rdma link del rxe0 2>/dev/null 14 + ip link del tun0 2>/dev/null 15 + ip link del tun1 2>/dev/null 16 + for m in "${MODS[@]}"; do modprobe -r "$m" 2>/dev/null; done 17 + } 18 + 19 + # Ensure cleanup runs on script exit or interrupt 20 + trap cleanup EXIT 21 + 22 + # --- Phase 1: Environment Check --- 23 + if [[ $EUID -ne 0 ]]; then 24 + echo "Error: This script must be run as root." 25 + exit 1 26 + fi 27 + 28 + for m in "${MODS[@]}"; do 29 + modprobe "$m" || { echo "Error: Failed to load $m"; exit 1; } 30 + done 31 + 32 + # --- Phase 2: Create Interfaces & RXE Links --- 33 + echo "Creating tun0 (1.1.1.1) and rxe0..." 34 + ip tuntap add mode tun tun0 35 + ip addr add 1.1.1.1/24 dev tun0 36 + ip link set tun0 up 37 + rdma link add rxe0 type rxe netdev tun0 38 + 39 + # Verify port 4791 is listening 40 + if ! ss -Huln sport = :$PORT | grep -q ":$PORT"; then 41 + echo "Error: UDP port $PORT not found after rxe0 creation" 42 + exit 1 43 + fi 44 + 45 + echo "Creating tun1 (2.2.2.2) and rxe1..." 46 + ip tuntap add mode tun tun1 47 + ip addr add 2.2.2.2/24 dev tun1 48 + ip link set tun1 up 49 + rdma link add rxe1 type rxe netdev tun1 50 + 51 + # Verify port 4791 is still listening 52 + if ! ss -Huln sport = :$PORT | grep -q ":$PORT"; then 53 + echo "Error: UDP port $PORT missing after rxe1 creation" 54 + exit 1 55 + fi 56 + 57 + # --- Phase 3: Targeted Deletion --- 58 + echo "Deleting rxe1..." 59 + rdma link del rxe1 60 + 61 + # Port should still be active because rxe0 is still alive 62 + if ! ss -Huln sport = :$PORT | grep -q ":$PORT"; then 63 + echo "Error: UDP port $PORT closed prematurely" 64 + exit 1 65 + fi 66 + 67 + echo "Deleting rxe0..." 68 + rdma link del rxe0 69 + 70 + # Port should now be gone 71 + if ss -Huln sport = :$PORT | grep -q ":$PORT"; then 72 + echo "Error: UDP port $PORT still exists after all links deleted" 73 + exit 1 74 + fi 75 + 76 + echo "Test passed successfully."

+63

tools/testing/selftests/rdma/rxe_test_NETDEV_UNREGISTER.sh

··· 1 + #!/bin/bash 2 + 3 + # Configuration 4 + DEV_NAME="tun0" 5 + RXE_NAME="rxe0" 6 + RDMA_PORT=4791 7 + 8 + exec > /dev/null 9 + 10 + # --- Cleanup Routine --- 11 + # Ensures environment is clean even if the script hits an error 12 + cleanup() { 13 + echo "Performing cleanup..." 14 + rdma link del $RXE_NAME 2>/dev/null 15 + ip link del $DEV_NAME 2>/dev/null 16 + modprobe -r rdma_rxe 2>/dev/null 17 + } 18 + trap cleanup EXIT 19 + 20 + # 1. Dependency Check 21 + if ! modinfo rdma_rxe >/dev/null 2>&1; then 22 + echo "Error: rdma_rxe module not found." 23 + exit 1 24 + fi 25 + 26 + modprobe rdma_rxe 27 + 28 + # 2. Setup TUN Device 29 + echo "Creating $DEV_NAME..." 30 + ip tuntap add mode tun "$DEV_NAME" 31 + ip addr add 1.1.1.1/24 dev "$DEV_NAME" 32 + ip link set "$DEV_NAME" up 33 + 34 + # 3. Attach RXE Link 35 + echo "Attaching RXE link $RXE_NAME to $DEV_NAME..." 36 + rdma link add "$RXE_NAME" type rxe netdev "$DEV_NAME" 37 + 38 + # 4. Verification: Port Check 39 + # Use -H (no header) and -q (quiet) for cleaner scripting logic 40 + if ! ss -Huln sport == :$RDMA_PORT | grep -q ":$RDMA_PORT"; then 41 + echo "Error: UDP port $RDMA_PORT is not listening." 42 + exit 1 43 + fi 44 + echo "Verified: RXE is listening on UDP $RDMA_PORT." 45 + 46 + # 5. Trigger NETDEV_UNREGISTER 47 + # We delete the underlying device without deleting the RDMA link first. 48 + echo "Triggering NETDEV_UNREGISTER by deleting $DEV_NAME..." 49 + ip link del "$DEV_NAME" 50 + 51 + # 6. Final Verification 52 + # The RXE link and the UDP port should be automatically cleaned up by the kernel. 53 + if rdma link show "$RXE_NAME" 2>/dev/null; then 54 + echo "Error: $RXE_NAME still exists after netdev removal." 55 + exit 1 56 + fi 57 + 58 + if ss -Huln sport == :$RDMA_PORT | grep -q ":$RDMA_PORT"; then 59 + echo "Error: UDP port $RDMA_PORT still listening after netdev removal." 60 + exit 1 61 + fi 62 + 63 + echo "Success: NETDEV_UNREGISTER handled correctly."

Configure Feed

Configure Feed