Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

RDMA/mlx5: Allow larger pages in DevX umem

The umem DMA list calculation was locked at 4k pages due to confusion
around how this API works and is used when larger pages are present.

The conclusion is:

- umem's cannot extend past what is mapped into the process, so creating
a lage page size and referring to a sub-range is not allowed

- umem's must always have a page offset of zero, except for sub PAGE_SIZE
umems

- The feature of umem_offset to create multiple objects inside a umem
is buggy and isn't used anyplace. Thus we can assume all users of the
current API have umem_offset == 0 as well

Provide a new page size calculator that limits the DMA list to the VA
range and enforces umem_offset == 0.

Allow user space to specify the page sizes which it can accept, this
bitmap must be derived from the intended use of the umem, based on
per-usage HW limitations.

Link: https://lore.kernel.org/r/20210304130501.1102577-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

+55 -10
+54 -10
drivers/infiniband/hw/mlx5/devx.c
··· 2185 2185 return 0; 2186 2186 } 2187 2187 2188 + static unsigned int devx_umem_find_best_pgsize(struct ib_umem *umem, 2189 + unsigned long pgsz_bitmap) 2190 + { 2191 + unsigned long page_size; 2192 + 2193 + /* Don't bother checking larger page sizes as offset must be zero and 2194 + * total DEVX umem length must be equal to total umem length. 2195 + */ 2196 + pgsz_bitmap &= GENMASK_ULL(max_t(u64, order_base_2(umem->length), 2197 + PAGE_SHIFT), 2198 + MLX5_ADAPTER_PAGE_SHIFT); 2199 + if (!pgsz_bitmap) 2200 + return 0; 2201 + 2202 + page_size = ib_umem_find_best_pgoff(umem, pgsz_bitmap, U64_MAX); 2203 + if (!page_size) 2204 + return 0; 2205 + 2206 + /* If the page_size is less than the CPU page size then we can use the 2207 + * offset and create a umem which is a subset of the page list. 2208 + * For larger page sizes we can't be sure the DMA list reflects the 2209 + * VA so we must ensure that the umem extent is exactly equal to the 2210 + * page list. Reduce the page size until one of these cases is true. 2211 + */ 2212 + while ((ib_umem_dma_offset(umem, page_size) != 0 || 2213 + (umem->length % page_size) != 0) && 2214 + page_size > PAGE_SIZE) 2215 + page_size /= 2; 2216 + 2217 + return page_size; 2218 + } 2219 + 2188 2220 static int devx_umem_reg_cmd_alloc(struct mlx5_ib_dev *dev, 2189 2221 struct uverbs_attr_bundle *attrs, 2190 2222 struct devx_umem *obj, 2191 2223 struct devx_umem_reg_cmd *cmd) 2192 2224 { 2225 + unsigned long pgsz_bitmap; 2193 2226 unsigned int page_size; 2194 2227 __be64 *mtt; 2195 2228 void *umem; 2229 + int ret; 2196 2230 2197 2231 /* 2198 - * We don't know what the user intends to use this umem for, but the HW 2199 - * restrictions must be met. MR, doorbell records, QP, WQ and CQ all 2200 - * have different requirements. Since we have no idea how to sort this 2201 - * out, only support PAGE_SIZE with the expectation that userspace will 2202 - * provide the necessary alignments inside the known PAGE_SIZE and that 2203 - * FW will check everything. 2232 + * If the user does not pass in pgsz_bitmap then the user promises not 2233 + * to use umem_offset!=0 in any commands that allocate on top of the 2234 + * umem. 2235 + * 2236 + * If the user wants to use a umem_offset then it must pass in 2237 + * pgsz_bitmap which guides the maximum page size and thus maximum 2238 + * object alignment inside the umem. See the PRM. 2239 + * 2240 + * Users are not allowed to use IOVA here, mkeys are not supported on 2241 + * umem. 2204 2242 */ 2205 - page_size = ib_umem_find_best_pgoff( 2206 - obj->umem, PAGE_SIZE, 2207 - __mlx5_page_offset_to_bitmask(__mlx5_bit_sz(umem, page_offset), 2208 - 0)); 2243 + ret = uverbs_get_const_default(&pgsz_bitmap, attrs, 2244 + MLX5_IB_ATTR_DEVX_UMEM_REG_PGSZ_BITMAP, 2245 + GENMASK_ULL(63, 2246 + min(PAGE_SHIFT, MLX5_ADAPTER_PAGE_SHIFT))); 2247 + if (ret) 2248 + return ret; 2249 + 2250 + page_size = devx_umem_find_best_pgsize(obj->umem, pgsz_bitmap); 2209 2251 if (!page_size) 2210 2252 return -EINVAL; 2211 2253 ··· 2833 2791 UA_MANDATORY), 2834 2792 UVERBS_ATTR_FLAGS_IN(MLX5_IB_ATTR_DEVX_UMEM_REG_ACCESS, 2835 2793 enum ib_access_flags), 2794 + UVERBS_ATTR_CONST_IN(MLX5_IB_ATTR_DEVX_UMEM_REG_PGSZ_BITMAP, 2795 + u64), 2836 2796 UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_DEVX_UMEM_REG_OUT_ID, 2837 2797 UVERBS_ATTR_TYPE(u32), 2838 2798 UA_MANDATORY));
+1
include/uapi/rdma/mlx5_user_ioctl_cmds.h
··· 154 154 MLX5_IB_ATTR_DEVX_UMEM_REG_LEN, 155 155 MLX5_IB_ATTR_DEVX_UMEM_REG_ACCESS, 156 156 MLX5_IB_ATTR_DEVX_UMEM_REG_OUT_ID, 157 + MLX5_IB_ATTR_DEVX_UMEM_REG_PGSZ_BITMAP, 157 158 }; 158 159 159 160 enum mlx5_ib_devx_umem_dereg_attrs {