Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'mlx5-updates-2023-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-03-28

Dragos Tatulea says:
====================

net/mlx5e: RX, Drop page_cache and fully use page_pool

For page allocation on the rx path, the mlx5e driver has been using an
internal page cache in tandem with the page pool. The internal page
cache uses a queue for page recycling which has the issue of head of
queue blocking.

This patch series drops the internal page_cache altogether and uses the
page_pool to implement everything that was done by the page_cache
before:
* Let the page_pool handle dma mapping and unmapping.
* Use fragmented pages with fragment counter instead of tracking via
page ref.
* Enable skb recycling.

The patch series has the following effects on the rx path:

* Improved performance for the cases when there was low page recycling
due to head of queue blocking in the internal page_cache. The test
for this was running a single iperf TCP stream to a rx queue
which is bound on the same cpu as the application.

|-------------+--------+--------+------+---------|
| rq type | before | after | unit | diff |
|-------------+--------+--------+------+---------|
| striding rq | 30.1 | 31.4 | Gbps | 4.14 % |
| legacy rq | 30.2 | 33.0 | Gbps | 8.48 % |
|-------------+--------+--------+------+---------|

* Small XDP performance degradation. The test was is XDP drop
program running on a single rx queue with small packets incoming
it looks like this:

|-------------+----------+----------+------+---------|
| rq type | before | after | unit | diff |
|-------------+----------+----------+------+---------|
| striding rq | 19725449 | 18544617 | pps | -6.37 % |
| legacy rq | 19879931 | 18631841 | pps | -6.70 % |
|-------------+----------+----------+------+---------|

This will be handled in a different patch series by adding support for
multi-packet per page.

* For other cases the performance is roughly the same.

The above numbers were obtained on the following system:
24 core Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
32 GB RAM
ConnectX-7 single port

The breakdown on the patch series is the following:
* Preparations for introducing the mlx5e_frag_page struct.
* Delete the mlx5e_page_cache struct.
* Enable dma mapping from page_pool.
* Enable skb recycling and fragment counting.
* Do deferred release of pages (just before alloc) to ensure better
page_pool cache utilization.

====================

* tag 'mlx5-updates-2023-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats
net/mlx5e: RX, Break the wqe bulk refill in smaller chunks
net/mlx5e: RX, Increase WQE bulk size for legacy rq
net/mlx5e: RX, Split off release path for xsk buffers for legacy rq
net/mlx5e: RX, Defer page release in legacy rq for better recycling
net/mlx5e: RX, Change wqe last_in_page field from bool to bit flags
net/mlx5e: RX, Defer page release in striding rq for better recycling
net/mlx5e: RX, Rename xdp_xmit_bitmap to a more generic name
net/mlx5e: RX, Enable skb page recycling through the page_pool
net/mlx5e: RX, Enable dma map and sync from page_pool allocator
net/mlx5e: RX, Remove internal page_cache
net/mlx5e: RX, Store SHAMPO header pages in array
net/mlx5e: RX, Remove alloc unit layout constraint for striding rq
net/mlx5e: RX, Remove alloc unit layout constraint for legacy rq
net/mlx5e: RX, Remove mlx5e_alloc_unit argument in page allocation
====================

Link: https://lore.kernel.org/r/20230328205623.142075-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+467 -392
-26
Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
··· 346 346 - The number of receive packets with CQE compression on ring i [#accel]_. 347 347 - Acceleration 348 348 349 - * - `rx[i]_cache_reuse` 350 - - The number of events of successful reuse of a page from a driver's 351 - internal page cache. 352 - - Acceleration 353 - 354 - * - `rx[i]_cache_full` 355 - - The number of events of full internal page cache where driver can't put a 356 - page back to the cache for recycling (page will be freed). 357 - - Acceleration 358 - 359 - * - `rx[i]_cache_empty` 360 - - The number of events where cache was empty - no page to give. Driver 361 - shall allocate new page. 362 - - Acceleration 363 - 364 - * - `rx[i]_cache_busy` 365 - - The number of events where cache head was busy and cannot be recycled. 366 - Driver allocated new page. 367 - - Acceleration 368 - 369 - * - `rx[i]_cache_waive` 370 - - The number of cache evacuation. This can occur due to page move to 371 - another NUMA node or page was pfmemalloc-ed and should be freed as soon 372 - as possible. 373 - - Acceleration 374 - 375 349 * - `rx[i]_arfs_err` 376 350 - Number of flow rules that failed to be added to the flow table. 377 351 - Error
+32 -19
drivers/net/ethernet/mellanox/mlx5/core/en.h
··· 475 475 cqe_ts_to_ns ptp_cyc2time; 476 476 } ____cacheline_aligned_in_smp; 477 477 478 - union mlx5e_alloc_unit { 479 - struct page *page; 480 - struct xdp_buff *xsk; 481 - }; 482 - 483 478 /* XDP packets can be transmitted in different ways. On completion, we need to 484 479 * distinguish between them to clean up things in a proper way. 485 480 */ ··· 600 605 struct work_struct recover_work; 601 606 } ____cacheline_aligned_in_smp; 602 607 608 + struct mlx5e_frag_page { 609 + struct page *page; 610 + u16 frags; 611 + }; 612 + 613 + enum mlx5e_wqe_frag_flag { 614 + MLX5E_WQE_FRAG_LAST_IN_PAGE, 615 + MLX5E_WQE_FRAG_SKIP_RELEASE, 616 + }; 617 + 603 618 struct mlx5e_wqe_frag_info { 604 - union mlx5e_alloc_unit *au; 619 + union { 620 + struct mlx5e_frag_page *frag_page; 621 + struct xdp_buff **xskp; 622 + }; 605 623 u32 offset; 606 - bool last_in_page; 624 + u8 flags; 625 + }; 626 + 627 + union mlx5e_alloc_units { 628 + DECLARE_FLEX_ARRAY(struct mlx5e_frag_page, frag_pages); 629 + DECLARE_FLEX_ARRAY(struct page *, pages); 630 + DECLARE_FLEX_ARRAY(struct xdp_buff *, xsk_buffs); 607 631 }; 608 632 609 633 struct mlx5e_mpw_info { 610 634 u16 consumed_strides; 611 - DECLARE_BITMAP(xdp_xmit_bitmap, MLX5_MPWRQ_MAX_PAGES_PER_WQE); 612 - union mlx5e_alloc_unit alloc_units[]; 635 + DECLARE_BITMAP(skip_release_bitmap, MLX5_MPWRQ_MAX_PAGES_PER_WQE); 636 + union mlx5e_alloc_units alloc_units; 613 637 }; 614 638 615 639 #define MLX5E_MAX_RX_FRAGS 4 ··· 639 625 #define MLX5E_CACHE_UNIT (MLX5_MPWRQ_MAX_PAGES_PER_WQE > NAPI_POLL_WEIGHT ? \ 640 626 MLX5_MPWRQ_MAX_PAGES_PER_WQE : NAPI_POLL_WEIGHT) 641 627 #define MLX5E_CACHE_SIZE (4 * roundup_pow_of_two(MLX5E_CACHE_UNIT)) 642 - struct mlx5e_page_cache { 643 - u32 head; 644 - u32 tail; 645 - struct page *page_cache[MLX5E_CACHE_SIZE]; 646 - }; 647 628 648 629 struct mlx5e_rq; 649 630 typedef void (*mlx5e_fp_handle_rx_cqe)(struct mlx5e_rq*, struct mlx5_cqe64*); ··· 670 661 struct mlx5e_rq_frag_info arr[MLX5E_MAX_RX_FRAGS]; 671 662 u8 num_frags; 672 663 u8 log_num_frags; 673 - u8 wqe_bulk; 664 + u16 wqe_bulk; 665 + u16 refill_unit; 674 666 u8 wqe_index_mask; 675 667 }; 676 668 677 669 struct mlx5e_dma_info { 678 670 dma_addr_t addr; 679 - struct page *page; 671 + union { 672 + struct mlx5e_frag_page *frag_page; 673 + struct page *page; 674 + }; 680 675 }; 681 676 682 677 struct mlx5e_shampo_hd { 683 678 u32 mkey; 684 679 struct mlx5e_dma_info *info; 685 - struct page *last_page; 680 + struct mlx5e_frag_page *pages; 681 + u16 curr_page_index; 686 682 u16 hd_per_wq; 687 683 u16 hd_per_wqe; 688 684 unsigned long *bitmap; ··· 716 702 struct { 717 703 struct mlx5_wq_cyc wq; 718 704 struct mlx5e_wqe_frag_info *frags; 719 - union mlx5e_alloc_unit *alloc_units; 705 + union mlx5e_alloc_units *alloc_units; 720 706 struct mlx5e_rq_frags_info info; 721 707 mlx5e_fp_skb_from_cqe skb_from_cqe; 722 708 } wqe; ··· 752 738 struct mlx5e_rq_stats *stats; 753 739 struct mlx5e_cq cq; 754 740 struct mlx5e_cq_decomp cqd; 755 - struct mlx5e_page_cache page_cache; 756 741 struct hwtstamp_config *tstamp; 757 742 struct mlx5_clock *clock; 758 743 struct mlx5e_icosq *icosq;
+49 -4
drivers/net/ethernet/mellanox/mlx5/core/en/params.c
··· 667 667 return first_frag_size + (MLX5E_MAX_RX_FRAGS - 2) * frag_size + PAGE_SIZE; 668 668 } 669 669 670 + static void mlx5e_rx_compute_wqe_bulk_params(struct mlx5e_params *params, 671 + struct mlx5e_rq_frags_info *info) 672 + { 673 + u16 bulk_bound_rq_size = (1 << params->log_rq_mtu_frames) / 4; 674 + u32 bulk_bound_rq_size_in_bytes; 675 + u32 sum_frag_strides = 0; 676 + u32 wqe_bulk_in_bytes; 677 + u16 split_factor; 678 + u32 wqe_bulk; 679 + int i; 680 + 681 + for (i = 0; i < info->num_frags; i++) 682 + sum_frag_strides += info->arr[i].frag_stride; 683 + 684 + /* For MTUs larger than PAGE_SIZE, align to PAGE_SIZE to reflect 685 + * amount of consumed pages per wqe in bytes. 686 + */ 687 + if (sum_frag_strides > PAGE_SIZE) 688 + sum_frag_strides = ALIGN(sum_frag_strides, PAGE_SIZE); 689 + 690 + bulk_bound_rq_size_in_bytes = bulk_bound_rq_size * sum_frag_strides; 691 + 692 + #define MAX_WQE_BULK_BYTES(xdp) ((xdp ? 256 : 512) * 1024) 693 + 694 + /* A WQE bulk should not exceed min(512KB, 1/4 of rq size). For XDP 695 + * keep bulk size smaller to avoid filling the page_pool cache on 696 + * every bulk refill. 697 + */ 698 + wqe_bulk_in_bytes = min_t(u32, MAX_WQE_BULK_BYTES(params->xdp_prog), 699 + bulk_bound_rq_size_in_bytes); 700 + wqe_bulk = DIV_ROUND_UP(wqe_bulk_in_bytes, sum_frag_strides); 701 + 702 + /* Make sure that allocations don't start when the page is still used 703 + * by older WQEs. 704 + */ 705 + info->wqe_bulk = max_t(u16, info->wqe_index_mask + 1, wqe_bulk); 706 + 707 + split_factor = DIV_ROUND_UP(MAX_WQE_BULK_BYTES(params->xdp_prog), 708 + PP_ALLOC_CACHE_REFILL * PAGE_SIZE); 709 + info->refill_unit = DIV_ROUND_UP(info->wqe_bulk, split_factor); 710 + } 711 + 670 712 #define DEFAULT_FRAG_SIZE (2048) 671 713 672 714 static int mlx5e_build_rq_frags_info(struct mlx5_core_dev *mdev, ··· 816 774 } 817 775 818 776 out: 819 - /* Bulking optimization to skip allocation until at least 8 WQEs can be 820 - * allocated in a row. At the same time, never start allocation when 821 - * the page is still used by older WQEs. 777 + /* Bulking optimization to skip allocation until a large enough number 778 + * of WQEs can be allocated in a row. Bulking also influences how well 779 + * deferred page release works. 822 780 */ 823 - info->wqe_bulk = max_t(u8, info->wqe_index_mask + 1, 8); 781 + mlx5e_rx_compute_wqe_bulk_params(params, info); 782 + 783 + mlx5_core_dbg(mdev, "%s: wqe_bulk = %u, wqe_bulk_refill_unit = %u\n", 784 + __func__, info->wqe_bulk, info->refill_unit); 824 785 825 786 info->log_num_frags = order_base_2(info->num_frags); 826 787
+2 -2
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
··· 121 121 122 122 mlx5e_reset_icosq_cc_pc(icosq); 123 123 124 - mlx5e_free_rx_in_progress_descs(rq); 124 + mlx5e_free_rx_missing_descs(rq); 125 125 if (xskrq) 126 - mlx5e_free_rx_in_progress_descs(xskrq); 126 + mlx5e_free_rx_missing_descs(xskrq); 127 127 128 128 clear_bit(MLX5E_SQ_STATE_RECOVERING, &icosq->state); 129 129 mlx5e_activate_icosq(icosq);
+2 -4
drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
··· 65 65 int mlx5e_poll_ico_cq(struct mlx5e_cq *cq); 66 66 67 67 /* RX */ 68 - void mlx5e_page_dma_unmap(struct mlx5e_rq *rq, struct page *page); 69 - void mlx5e_page_release_dynamic(struct mlx5e_rq *rq, struct page *page, bool recycle); 70 68 INDIRECT_CALLABLE_DECLARE(bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq)); 71 69 INDIRECT_CALLABLE_DECLARE(bool mlx5e_post_rx_mpwqes(struct mlx5e_rq *rq)); 72 70 int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget); 73 71 void mlx5e_free_rx_descs(struct mlx5e_rq *rq); 74 - void mlx5e_free_rx_in_progress_descs(struct mlx5e_rq *rq); 72 + void mlx5e_free_rx_missing_descs(struct mlx5e_rq *rq); 75 73 76 74 static inline bool mlx5e_rx_hw_stamp(struct hwtstamp_config *config) 77 75 { ··· 487 489 488 490 static inline struct mlx5e_mpw_info *mlx5e_get_mpw_info(struct mlx5e_rq *rq, int i) 489 491 { 490 - size_t isz = struct_size(rq->mpwqe.info, alloc_units, rq->mpwqe.pages_per_wqe); 492 + size_t isz = struct_size(rq->mpwqe.info, alloc_units.frag_pages, rq->mpwqe.pages_per_wqe); 491 493 492 494 return (struct mlx5e_mpw_info *)((char *)rq->mpwqe.info + array_size(i, isz)); 493 495 }
+4 -6
drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
··· 209 209 goto xdp_abort; 210 210 __set_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags); 211 211 __set_bit(MLX5E_RQ_FLAG_XDP_REDIRECT, rq->flags); 212 - if (xdp->rxq->mem.type != MEM_TYPE_XSK_BUFF_POOL) 213 - mlx5e_page_dma_unmap(rq, virt_to_page(xdp->data)); 214 212 rq->stats->xdp_redirect++; 215 213 return true; 216 214 default: ··· 505 507 static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq, 506 508 struct mlx5e_xdp_wqe_info *wi, 507 509 u32 *xsk_frames, 508 - bool recycle, 509 510 struct xdp_frame_bulk *bq) 510 511 { 511 512 struct mlx5e_xdp_info_fifo *xdpi_fifo = &sq->db.xdpi_fifo; ··· 522 525 break; 523 526 case MLX5E_XDP_XMIT_MODE_PAGE: 524 527 /* XDP_TX from the regular RQ */ 525 - mlx5e_page_release_dynamic(xdpi.page.rq, xdpi.page.page, recycle); 528 + page_pool_put_defragged_page(xdpi.page.rq->page_pool, 529 + xdpi.page.page, -1, true); 526 530 break; 527 531 case MLX5E_XDP_XMIT_MODE_XSK: 528 532 /* AF_XDP send */ ··· 577 579 578 580 sqcc += wi->num_wqebbs; 579 581 580 - mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, true, &bq); 582 + mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq); 581 583 } while (!last_wqe); 582 584 583 585 if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_REQ)) { ··· 624 626 625 627 sq->cc += wi->num_wqebbs; 626 628 627 - mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, false, &bq); 629 + mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq); 628 630 } 629 631 630 632 xdp_flush_frame_bulk(&bq);
+27 -27
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
··· 22 22 struct mlx5e_icosq *icosq = rq->icosq; 23 23 struct mlx5_wq_cyc *wq = &icosq->wq; 24 24 struct mlx5e_umr_wqe *umr_wqe; 25 + struct xdp_buff **xsk_buffs; 25 26 int batch, i; 26 27 u32 offset; /* 17-bit value with MTT. */ 27 28 u16 pi; ··· 30 29 if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool, rq->mpwqe.pages_per_wqe))) 31 30 goto err; 32 31 33 - BUILD_BUG_ON(sizeof(wi->alloc_units[0]) != sizeof(wi->alloc_units[0].xsk)); 34 32 XSK_CHECK_PRIV_TYPE(struct mlx5e_xdp_buff); 35 - batch = xsk_buff_alloc_batch(rq->xsk_pool, (struct xdp_buff **)wi->alloc_units, 33 + xsk_buffs = (struct xdp_buff **)wi->alloc_units.xsk_buffs; 34 + batch = xsk_buff_alloc_batch(rq->xsk_pool, xsk_buffs, 36 35 rq->mpwqe.pages_per_wqe); 37 36 38 37 /* If batch < pages_per_wqe, either: ··· 42 41 * the first error, which will mean there are no more valid descriptors. 43 42 */ 44 43 for (; batch < rq->mpwqe.pages_per_wqe; batch++) { 45 - wi->alloc_units[batch].xsk = xsk_buff_alloc(rq->xsk_pool); 46 - if (unlikely(!wi->alloc_units[batch].xsk)) 44 + xsk_buffs[batch] = xsk_buff_alloc(rq->xsk_pool); 45 + if (unlikely(!xsk_buffs[batch])) 47 46 goto err_reuse_batch; 48 47 } 49 48 ··· 53 52 54 53 if (likely(rq->mpwqe.umr_mode == MLX5E_MPWRQ_UMR_MODE_ALIGNED)) { 55 54 for (i = 0; i < batch; i++) { 56 - struct mlx5e_xdp_buff *mxbuf = xsk_buff_to_mxbuf(wi->alloc_units[i].xsk); 57 - dma_addr_t addr = xsk_buff_xdp_get_frame_dma(wi->alloc_units[i].xsk); 55 + struct mlx5e_xdp_buff *mxbuf = xsk_buff_to_mxbuf(xsk_buffs[i]); 56 + dma_addr_t addr = xsk_buff_xdp_get_frame_dma(xsk_buffs[i]); 58 57 59 58 umr_wqe->inline_mtts[i] = (struct mlx5_mtt) { 60 59 .ptag = cpu_to_be64(addr | MLX5_EN_WR), ··· 63 62 } 64 63 } else if (unlikely(rq->mpwqe.umr_mode == MLX5E_MPWRQ_UMR_MODE_UNALIGNED)) { 65 64 for (i = 0; i < batch; i++) { 66 - struct mlx5e_xdp_buff *mxbuf = xsk_buff_to_mxbuf(wi->alloc_units[i].xsk); 67 - dma_addr_t addr = xsk_buff_xdp_get_frame_dma(wi->alloc_units[i].xsk); 65 + struct mlx5e_xdp_buff *mxbuf = xsk_buff_to_mxbuf(xsk_buffs[i]); 66 + dma_addr_t addr = xsk_buff_xdp_get_frame_dma(xsk_buffs[i]); 68 67 69 68 umr_wqe->inline_ksms[i] = (struct mlx5_ksm) { 70 69 .key = rq->mkey_be, ··· 76 75 u32 mapping_size = 1 << (rq->mpwqe.page_shift - 2); 77 76 78 77 for (i = 0; i < batch; i++) { 79 - struct mlx5e_xdp_buff *mxbuf = xsk_buff_to_mxbuf(wi->alloc_units[i].xsk); 80 - dma_addr_t addr = xsk_buff_xdp_get_frame_dma(wi->alloc_units[i].xsk); 78 + struct mlx5e_xdp_buff *mxbuf = xsk_buff_to_mxbuf(xsk_buffs[i]); 79 + dma_addr_t addr = xsk_buff_xdp_get_frame_dma(xsk_buffs[i]); 81 80 82 81 umr_wqe->inline_ksms[i << 2] = (struct mlx5_ksm) { 83 82 .key = rq->mkey_be, ··· 103 102 __be32 frame_size = cpu_to_be32(rq->xsk_pool->chunk_size); 104 103 105 104 for (i = 0; i < batch; i++) { 106 - struct mlx5e_xdp_buff *mxbuf = xsk_buff_to_mxbuf(wi->alloc_units[i].xsk); 107 - dma_addr_t addr = xsk_buff_xdp_get_frame_dma(wi->alloc_units[i].xsk); 105 + struct mlx5e_xdp_buff *mxbuf = xsk_buff_to_mxbuf(xsk_buffs[i]); 106 + dma_addr_t addr = xsk_buff_xdp_get_frame_dma(xsk_buffs[i]); 108 107 109 108 umr_wqe->inline_klms[i << 1] = (struct mlx5_klm) { 110 109 .key = rq->mkey_be, ··· 120 119 } 121 120 } 122 121 123 - bitmap_zero(wi->xdp_xmit_bitmap, rq->mpwqe.pages_per_wqe); 122 + bitmap_zero(wi->skip_release_bitmap, rq->mpwqe.pages_per_wqe); 124 123 wi->consumed_strides = 0; 125 124 126 125 umr_wqe->ctrl.opmod_idx_opcode = ··· 150 149 151 150 err_reuse_batch: 152 151 while (--batch >= 0) 153 - xsk_buff_free(wi->alloc_units[batch].xsk); 152 + xsk_buff_free(xsk_buffs[batch]); 154 153 155 154 err: 156 155 rq->stats->buff_alloc_err++; ··· 164 163 u32 contig, alloc; 165 164 int i; 166 165 167 - /* mlx5e_init_frags_partition creates a 1:1 mapping between 168 - * rq->wqe.frags and rq->wqe.alloc_units, which allows us to 169 - * allocate XDP buffers straight into alloc_units. 166 + /* Each rq->wqe.frags->xskp is 1:1 mapped to an element inside the 167 + * rq->wqe.alloc_units->xsk_buffs array allocated here. 170 168 */ 171 - BUILD_BUG_ON(sizeof(rq->wqe.alloc_units[0]) != 172 - sizeof(rq->wqe.alloc_units[0].xsk)); 173 - buffs = (struct xdp_buff **)rq->wqe.alloc_units; 169 + buffs = rq->wqe.alloc_units->xsk_buffs; 174 170 contig = mlx5_wq_cyc_get_size(wq) - ix; 175 171 if (wqe_bulk <= contig) { 176 172 alloc = xsk_buff_alloc_batch(rq->xsk_pool, buffs + ix, wqe_bulk); ··· 187 189 /* Assumes log_num_frags == 0. */ 188 190 frag = &rq->wqe.frags[j]; 189 191 190 - addr = xsk_buff_xdp_get_frame_dma(frag->au->xsk); 192 + addr = xsk_buff_xdp_get_frame_dma(*frag->xskp); 191 193 wqe->data[0].addr = cpu_to_be64(addr + rq->buff.headroom); 194 + frag->flags &= ~BIT(MLX5E_WQE_FRAG_SKIP_RELEASE); 192 195 } 193 196 194 197 return alloc; ··· 210 211 /* Assumes log_num_frags == 0. */ 211 212 frag = &rq->wqe.frags[j]; 212 213 213 - frag->au->xsk = xsk_buff_alloc(rq->xsk_pool); 214 - if (unlikely(!frag->au->xsk)) 214 + *frag->xskp = xsk_buff_alloc(rq->xsk_pool); 215 + if (unlikely(!*frag->xskp)) 215 216 return i; 216 217 217 - addr = xsk_buff_xdp_get_frame_dma(frag->au->xsk); 218 + addr = xsk_buff_xdp_get_frame_dma(*frag->xskp); 218 219 wqe->data[0].addr = cpu_to_be64(addr + rq->buff.headroom); 220 + frag->flags &= ~BIT(MLX5E_WQE_FRAG_SKIP_RELEASE); 219 221 } 220 222 221 223 return wqe_bulk; ··· 251 251 u32 head_offset, 252 252 u32 page_idx) 253 253 { 254 - struct mlx5e_xdp_buff *mxbuf = xsk_buff_to_mxbuf(wi->alloc_units[page_idx].xsk); 254 + struct mlx5e_xdp_buff *mxbuf = xsk_buff_to_mxbuf(wi->alloc_units.xsk_buffs[page_idx]); 255 255 struct bpf_prog *prog; 256 256 257 257 /* Check packet size. Note LRO doesn't use linear SKB */ ··· 291 291 prog = rcu_dereference(rq->xdp_prog); 292 292 if (likely(prog && mlx5e_xdp_handle(rq, prog, mxbuf))) { 293 293 if (likely(__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags))) 294 - __set_bit(page_idx, wi->xdp_xmit_bitmap); /* non-atomic */ 294 + __set_bit(page_idx, wi->skip_release_bitmap); /* non-atomic */ 295 295 return NULL; /* page/packet was consumed by XDP */ 296 296 } 297 297 ··· 306 306 struct mlx5_cqe64 *cqe, 307 307 u32 cqe_bcnt) 308 308 { 309 - struct mlx5e_xdp_buff *mxbuf = xsk_buff_to_mxbuf(wi->au->xsk); 309 + struct mlx5e_xdp_buff *mxbuf = xsk_buff_to_mxbuf(*wi->xskp); 310 310 struct bpf_prog *prog; 311 311 312 312 /* wi->offset is not used in this function, because xdp->data and the
+109 -60
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
··· 262 262 263 263 shampo->bitmap = bitmap_zalloc_node(shampo->hd_per_wq, GFP_KERNEL, 264 264 node); 265 - if (!shampo->bitmap) 266 - return -ENOMEM; 267 - 268 265 shampo->info = kvzalloc_node(array_size(shampo->hd_per_wq, 269 266 sizeof(*shampo->info)), 270 267 GFP_KERNEL, node); 271 - if (!shampo->info) { 272 - kvfree(shampo->bitmap); 273 - return -ENOMEM; 274 - } 268 + shampo->pages = kvzalloc_node(array_size(shampo->hd_per_wq, 269 + sizeof(*shampo->pages)), 270 + GFP_KERNEL, node); 271 + if (!shampo->bitmap || !shampo->info || !shampo->pages) 272 + goto err_nomem; 273 + 275 274 return 0; 275 + 276 + err_nomem: 277 + kvfree(shampo->info); 278 + kvfree(shampo->bitmap); 279 + kvfree(shampo->pages); 280 + 281 + return -ENOMEM; 276 282 } 277 283 278 284 static void mlx5e_rq_shampo_hd_info_free(struct mlx5e_rq *rq) 279 285 { 280 286 kvfree(rq->mpwqe.shampo->bitmap); 281 287 kvfree(rq->mpwqe.shampo->info); 288 + kvfree(rq->mpwqe.shampo->pages); 282 289 } 283 290 284 291 static int mlx5e_rq_alloc_mpwqe_info(struct mlx5e_rq *rq, int node) ··· 293 286 int wq_sz = mlx5_wq_ll_get_size(&rq->mpwqe.wq); 294 287 size_t alloc_size; 295 288 296 - alloc_size = array_size(wq_sz, struct_size(rq->mpwqe.info, alloc_units, 289 + alloc_size = array_size(wq_sz, struct_size(rq->mpwqe.info, 290 + alloc_units.frag_pages, 297 291 rq->mpwqe.pages_per_wqe)); 298 292 299 293 rq->mpwqe.info = kvzalloc_node(alloc_size, GFP_KERNEL, node); 300 294 if (!rq->mpwqe.info) 301 295 return -ENOMEM; 296 + 297 + /* For deferred page release (release right before alloc), make sure 298 + * that on first round release is not called. 299 + */ 300 + for (int i = 0; i < wq_sz; i++) { 301 + struct mlx5e_mpw_info *wi = mlx5e_get_mpw_info(rq, i); 302 + 303 + bitmap_fill(wi->skip_release_bitmap, rq->mpwqe.pages_per_wqe); 304 + } 302 305 303 306 mlx5e_build_umr_wqe(rq, rq->icosq, &rq->mpwqe.umr_wqe); 304 307 ··· 516 499 struct mlx5e_wqe_frag_info *prev = NULL; 517 500 int i; 518 501 519 - if (rq->xsk_pool) { 520 - /* Assumptions used by XSK batched allocator. */ 521 - WARN_ON(rq->wqe.info.num_frags != 1); 522 - WARN_ON(rq->wqe.info.log_num_frags != 0); 523 - WARN_ON(rq->wqe.info.arr[0].frag_stride != PAGE_SIZE); 524 - } 502 + WARN_ON(rq->xsk_pool); 525 503 526 - next_frag.au = &rq->wqe.alloc_units[0]; 504 + next_frag.frag_page = &rq->wqe.alloc_units->frag_pages[0]; 505 + 506 + /* Skip first release due to deferred release. */ 507 + next_frag.flags = BIT(MLX5E_WQE_FRAG_SKIP_RELEASE); 527 508 528 509 for (i = 0; i < mlx5_wq_cyc_get_size(&rq->wqe.wq); i++) { 529 510 struct mlx5e_rq_frag_info *frag_info = &rq->wqe.info.arr[0]; ··· 531 516 532 517 for (f = 0; f < rq->wqe.info.num_frags; f++, frag++) { 533 518 if (next_frag.offset + frag_info[f].frag_stride > PAGE_SIZE) { 534 - next_frag.au++; 519 + /* Pages are assigned at runtime. */ 520 + next_frag.frag_page++; 535 521 next_frag.offset = 0; 536 522 if (prev) 537 - prev->last_in_page = true; 523 + prev->flags |= BIT(MLX5E_WQE_FRAG_LAST_IN_PAGE); 538 524 } 539 525 *frag = next_frag; 540 526 ··· 546 530 } 547 531 548 532 if (prev) 549 - prev->last_in_page = true; 533 + prev->flags |= BIT(MLX5E_WQE_FRAG_LAST_IN_PAGE); 550 534 } 551 535 552 - static int mlx5e_init_au_list(struct mlx5e_rq *rq, int wq_sz, int node) 536 + static void mlx5e_init_xsk_buffs(struct mlx5e_rq *rq) 553 537 { 554 - int len = wq_sz << rq->wqe.info.log_num_frags; 538 + int i; 555 539 556 - rq->wqe.alloc_units = kvzalloc_node(array_size(len, sizeof(*rq->wqe.alloc_units)), 557 - GFP_KERNEL, node); 558 - if (!rq->wqe.alloc_units) 540 + /* Assumptions used by XSK batched allocator. */ 541 + WARN_ON(rq->wqe.info.num_frags != 1); 542 + WARN_ON(rq->wqe.info.log_num_frags != 0); 543 + WARN_ON(rq->wqe.info.arr[0].frag_stride != PAGE_SIZE); 544 + 545 + /* Considering the above assumptions a fragment maps to a single 546 + * xsk_buff. 547 + */ 548 + for (i = 0; i < mlx5_wq_cyc_get_size(&rq->wqe.wq); i++) { 549 + rq->wqe.frags[i].xskp = &rq->wqe.alloc_units->xsk_buffs[i]; 550 + 551 + /* Skip first release due to deferred release as WQES are 552 + * not allocated yet. 553 + */ 554 + rq->wqe.frags[i].flags |= BIT(MLX5E_WQE_FRAG_SKIP_RELEASE); 555 + } 556 + } 557 + 558 + static int mlx5e_init_wqe_alloc_info(struct mlx5e_rq *rq, int node) 559 + { 560 + int wq_sz = mlx5_wq_cyc_get_size(&rq->wqe.wq); 561 + int len = wq_sz << rq->wqe.info.log_num_frags; 562 + struct mlx5e_wqe_frag_info *frags; 563 + union mlx5e_alloc_units *aus; 564 + int aus_sz; 565 + 566 + if (rq->xsk_pool) 567 + aus_sz = sizeof(*aus->xsk_buffs); 568 + else 569 + aus_sz = sizeof(*aus->frag_pages); 570 + 571 + aus = kvzalloc_node(array_size(len, aus_sz), GFP_KERNEL, node); 572 + if (!aus) 559 573 return -ENOMEM; 560 574 561 - mlx5e_init_frags_partition(rq); 575 + frags = kvzalloc_node(array_size(len, sizeof(*frags)), GFP_KERNEL, node); 576 + if (!frags) { 577 + kvfree(aus); 578 + return -ENOMEM; 579 + } 580 + 581 + rq->wqe.alloc_units = aus; 582 + rq->wqe.frags = frags; 583 + 584 + if (rq->xsk_pool) 585 + mlx5e_init_xsk_buffs(rq); 586 + else 587 + mlx5e_init_frags_partition(rq); 562 588 563 589 return 0; 564 590 } 565 591 566 - static void mlx5e_free_au_list(struct mlx5e_rq *rq) 592 + static void mlx5e_free_wqe_alloc_info(struct mlx5e_rq *rq) 567 593 { 594 + kvfree(rq->wqe.frags); 568 595 kvfree(rq->wqe.alloc_units); 569 596 } 570 597 ··· 752 693 struct mlx5e_rq_param *rqp, 753 694 int node, struct mlx5e_rq *rq) 754 695 { 755 - struct page_pool_params pp_params = { 0 }; 756 696 struct mlx5_core_dev *mdev = rq->mdev; 757 697 void *rqc = rqp->rqc; 758 698 void *rqc_wq = MLX5_ADDR_OF(rqc, rqc, wq); ··· 836 778 rq->wqe.info = rqp->frags_info; 837 779 rq->buff.frame0_sz = rq->wqe.info.arr[0].frag_stride; 838 780 839 - rq->wqe.frags = 840 - kvzalloc_node(array_size(sizeof(*rq->wqe.frags), 841 - (wq_sz << rq->wqe.info.log_num_frags)), 842 - GFP_KERNEL, node); 843 - if (!rq->wqe.frags) { 844 - err = -ENOMEM; 845 - goto err_rq_wq_destroy; 846 - } 847 - 848 - err = mlx5e_init_au_list(rq, wq_sz, node); 781 + err = mlx5e_init_wqe_alloc_info(rq, node); 849 782 if (err) 850 - goto err_rq_frags; 783 + goto err_rq_wq_destroy; 851 784 } 852 785 853 786 if (xsk) { ··· 847 798 xsk_pool_set_rxq_info(rq->xsk_pool, &rq->xdp_rxq); 848 799 } else { 849 800 /* Create a page_pool and register it with rxq */ 801 + struct page_pool_params pp_params = { 0 }; 802 + 850 803 pp_params.order = 0; 851 - pp_params.flags = 0; /* No-internal DMA mapping in page_pool */ 804 + pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV | PP_FLAG_PAGE_FRAG; 852 805 pp_params.pool_size = pool_size; 853 806 pp_params.nid = node; 854 807 pp_params.dev = rq->pdev; 855 808 pp_params.dma_dir = rq->buff.map_dir; 809 + pp_params.max_len = PAGE_SIZE; 856 810 857 811 /* page_pool can be used even when there is no rq->xdp_prog, 858 812 * given page_pool does not handle DMA mapping there is no ··· 921 869 rq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE; 922 870 } 923 871 924 - rq->page_cache.head = 0; 925 - rq->page_cache.tail = 0; 926 - 927 872 return 0; 928 873 929 874 err_destroy_page_pool: ··· 937 888 mlx5e_free_mpwqe_rq_drop_page(rq); 938 889 break; 939 890 default: /* MLX5_WQ_TYPE_CYCLIC */ 940 - mlx5e_free_au_list(rq); 941 - err_rq_frags: 942 - kvfree(rq->wqe.frags); 891 + mlx5e_free_wqe_alloc_info(rq); 943 892 } 944 893 err_rq_wq_destroy: 945 894 mlx5_wq_destroy(&rq->wq_ctrl); ··· 951 904 static void mlx5e_free_rq(struct mlx5e_rq *rq) 952 905 { 953 906 struct bpf_prog *old_prog; 954 - int i; 955 907 956 908 if (xdp_rxq_info_is_reg(&rq->xdp_rxq)) { 957 909 old_prog = rcu_dereference_protected(rq->xdp_prog, ··· 967 921 mlx5e_rq_free_shampo(rq); 968 922 break; 969 923 default: /* MLX5_WQ_TYPE_CYCLIC */ 970 - kvfree(rq->wqe.frags); 971 - mlx5e_free_au_list(rq); 972 - } 973 - 974 - for (i = rq->page_cache.head; i != rq->page_cache.tail; 975 - i = (i + 1) & (MLX5E_CACHE_SIZE - 1)) { 976 - /* With AF_XDP, page_cache is not used, so this loop is not 977 - * entered, and it's safe to call mlx5e_page_release_dynamic 978 - * directly. 979 - */ 980 - mlx5e_page_release_dynamic(rq, rq->page_cache.page_cache[i], false); 924 + mlx5e_free_wqe_alloc_info(rq); 981 925 } 982 926 983 927 xdp_rxq_info_unreg(&rq->xdp_rxq); ··· 1130 1094 return -ETIMEDOUT; 1131 1095 } 1132 1096 1133 - void mlx5e_free_rx_in_progress_descs(struct mlx5e_rq *rq) 1097 + void mlx5e_free_rx_missing_descs(struct mlx5e_rq *rq) 1134 1098 { 1135 1099 struct mlx5_wq_ll *wq; 1136 1100 u16 head; ··· 1142 1106 wq = &rq->mpwqe.wq; 1143 1107 head = wq->head; 1144 1108 1145 - /* Outstanding UMR WQEs (in progress) start at wq->head */ 1146 - for (i = 0; i < rq->mpwqe.umr_in_progress; i++) { 1109 + /* Release WQEs that are in missing state: they have been 1110 + * popped from the list after completion but were not freed 1111 + * due to deferred release. 1112 + * Also free the linked-list reserved entry, hence the "+ 1". 1113 + */ 1114 + for (i = 0; i < mlx5_wq_ll_missing(wq) + 1; i++) { 1147 1115 rq->dealloc_wqe(rq, head); 1148 1116 head = mlx5_wq_ll_get_wqe_next_ix(wq, head); 1149 1117 } ··· 1174 1134 if (rq->wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ) { 1175 1135 struct mlx5_wq_ll *wq = &rq->mpwqe.wq; 1176 1136 1177 - mlx5e_free_rx_in_progress_descs(rq); 1137 + mlx5e_free_rx_missing_descs(rq); 1178 1138 1179 1139 while (!mlx5_wq_ll_is_empty(wq)) { 1180 1140 struct mlx5e_rx_wqe_ll *wqe; ··· 1192 1152 0, true); 1193 1153 } else { 1194 1154 struct mlx5_wq_cyc *wq = &rq->wqe.wq; 1155 + u16 missing = mlx5_wq_cyc_missing(wq); 1156 + u16 head = mlx5_wq_cyc_get_head(wq); 1195 1157 1196 1158 while (!mlx5_wq_cyc_is_empty(wq)) { 1197 1159 wqe_ix = mlx5_wq_cyc_get_tail(wq); 1198 1160 rq->dealloc_wqe(rq, wqe_ix); 1199 1161 mlx5_wq_cyc_pop(wq); 1162 + } 1163 + /* Missing slots might also contain unreleased pages due to 1164 + * deferred release. 1165 + */ 1166 + while (missing--) { 1167 + wqe_ix = mlx5_wq_cyc_ctr2ix(wq, head++); 1168 + rq->dealloc_wqe(rq, wqe_ix); 1200 1169 } 1201 1170 } 1202 1171
+242 -214
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
··· 271 271 return mlx5e_decompress_cqes_cont(rq, wq, 1, budget_rem); 272 272 } 273 273 274 - static inline bool mlx5e_rx_cache_put(struct mlx5e_rq *rq, struct page *page) 274 + #define MLX5E_PAGECNT_BIAS_MAX (PAGE_SIZE / 64) 275 + 276 + static int mlx5e_page_alloc_fragmented(struct mlx5e_rq *rq, 277 + struct mlx5e_frag_page *frag_page) 275 278 { 276 - struct mlx5e_page_cache *cache = &rq->page_cache; 277 - u32 tail_next = (cache->tail + 1) & (MLX5E_CACHE_SIZE - 1); 278 - struct mlx5e_rq_stats *stats = rq->stats; 279 + struct page *page; 279 280 280 - if (tail_next == cache->head) { 281 - stats->cache_full++; 282 - return false; 283 - } 284 - 285 - if (!dev_page_is_reusable(page)) { 286 - stats->cache_waive++; 287 - return false; 288 - } 289 - 290 - cache->page_cache[cache->tail] = page; 291 - cache->tail = tail_next; 292 - return true; 293 - } 294 - 295 - static inline bool mlx5e_rx_cache_get(struct mlx5e_rq *rq, union mlx5e_alloc_unit *au) 296 - { 297 - struct mlx5e_page_cache *cache = &rq->page_cache; 298 - struct mlx5e_rq_stats *stats = rq->stats; 299 - dma_addr_t addr; 300 - 301 - if (unlikely(cache->head == cache->tail)) { 302 - stats->cache_empty++; 303 - return false; 304 - } 305 - 306 - if (page_ref_count(cache->page_cache[cache->head]) != 1) { 307 - stats->cache_busy++; 308 - return false; 309 - } 310 - 311 - au->page = cache->page_cache[cache->head]; 312 - cache->head = (cache->head + 1) & (MLX5E_CACHE_SIZE - 1); 313 - stats->cache_reuse++; 314 - 315 - addr = page_pool_get_dma_addr(au->page); 316 - /* Non-XSK always uses PAGE_SIZE. */ 317 - dma_sync_single_for_device(rq->pdev, addr, PAGE_SIZE, rq->buff.map_dir); 318 - return true; 319 - } 320 - 321 - static inline int mlx5e_page_alloc_pool(struct mlx5e_rq *rq, union mlx5e_alloc_unit *au) 322 - { 323 - dma_addr_t addr; 324 - 325 - if (mlx5e_rx_cache_get(rq, au)) 326 - return 0; 327 - 328 - au->page = page_pool_dev_alloc_pages(rq->page_pool); 329 - if (unlikely(!au->page)) 281 + page = page_pool_dev_alloc_pages(rq->page_pool); 282 + if (unlikely(!page)) 330 283 return -ENOMEM; 331 284 332 - /* Non-XSK always uses PAGE_SIZE. */ 333 - addr = dma_map_page(rq->pdev, au->page, 0, PAGE_SIZE, rq->buff.map_dir); 334 - if (unlikely(dma_mapping_error(rq->pdev, addr))) { 335 - page_pool_recycle_direct(rq->page_pool, au->page); 336 - au->page = NULL; 337 - return -ENOMEM; 338 - } 339 - page_pool_set_dma_addr(au->page, addr); 285 + page_pool_fragment_page(page, MLX5E_PAGECNT_BIAS_MAX); 286 + 287 + *frag_page = (struct mlx5e_frag_page) { 288 + .page = page, 289 + .frags = 0, 290 + }; 340 291 341 292 return 0; 342 293 } 343 294 344 - void mlx5e_page_dma_unmap(struct mlx5e_rq *rq, struct page *page) 295 + static void mlx5e_page_release_fragmented(struct mlx5e_rq *rq, 296 + struct mlx5e_frag_page *frag_page) 345 297 { 346 - dma_addr_t dma_addr = page_pool_get_dma_addr(page); 298 + u16 drain_count = MLX5E_PAGECNT_BIAS_MAX - frag_page->frags; 299 + struct page *page = frag_page->page; 347 300 348 - dma_unmap_page_attrs(rq->pdev, dma_addr, PAGE_SIZE, rq->buff.map_dir, 349 - DMA_ATTR_SKIP_CPU_SYNC); 350 - page_pool_set_dma_addr(page, 0); 351 - } 352 - 353 - void mlx5e_page_release_dynamic(struct mlx5e_rq *rq, struct page *page, bool recycle) 354 - { 355 - if (likely(recycle)) { 356 - if (mlx5e_rx_cache_put(rq, page)) 357 - return; 358 - 359 - mlx5e_page_dma_unmap(rq, page); 360 - page_pool_recycle_direct(rq->page_pool, page); 361 - } else { 362 - mlx5e_page_dma_unmap(rq, page); 363 - page_pool_release_page(rq->page_pool, page); 364 - put_page(page); 365 - } 301 + if (page_pool_defrag_page(page, drain_count) == 0) 302 + page_pool_put_defragged_page(rq->page_pool, page, -1, true); 366 303 } 367 304 368 305 static inline int mlx5e_get_rx_frag(struct mlx5e_rq *rq, ··· 308 371 int err = 0; 309 372 310 373 if (!frag->offset) 311 - /* On first frag (offset == 0), replenish page (alloc_unit actually). 312 - * Other frags that point to the same alloc_unit (with a different 374 + /* On first frag (offset == 0), replenish page. 375 + * Other frags that point to the same page (with a different 313 376 * offset) should just use the new one without replenishing again 314 377 * by themselves. 315 378 */ 316 - err = mlx5e_page_alloc_pool(rq, frag->au); 379 + err = mlx5e_page_alloc_fragmented(rq, frag->frag_page); 317 380 318 381 return err; 319 382 } 320 383 321 - static inline void mlx5e_put_rx_frag(struct mlx5e_rq *rq, 322 - struct mlx5e_wqe_frag_info *frag, 323 - bool recycle) 384 + static bool mlx5e_frag_can_release(struct mlx5e_wqe_frag_info *frag) 324 385 { 325 - if (frag->last_in_page) 326 - mlx5e_page_release_dynamic(rq, frag->au->page, recycle); 386 + #define CAN_RELEASE_MASK \ 387 + (BIT(MLX5E_WQE_FRAG_LAST_IN_PAGE) | BIT(MLX5E_WQE_FRAG_SKIP_RELEASE)) 388 + 389 + #define CAN_RELEASE_VALUE BIT(MLX5E_WQE_FRAG_LAST_IN_PAGE) 390 + 391 + return (frag->flags & CAN_RELEASE_MASK) == CAN_RELEASE_VALUE; 392 + } 393 + 394 + static inline void mlx5e_put_rx_frag(struct mlx5e_rq *rq, 395 + struct mlx5e_wqe_frag_info *frag) 396 + { 397 + if (mlx5e_frag_can_release(frag)) 398 + mlx5e_page_release_fragmented(rq, frag->frag_page); 327 399 } 328 400 329 401 static inline struct mlx5e_wqe_frag_info *get_frag(struct mlx5e_rq *rq, u16 ix) ··· 355 409 if (unlikely(err)) 356 410 goto free_frags; 357 411 412 + frag->flags &= ~BIT(MLX5E_WQE_FRAG_SKIP_RELEASE); 413 + 358 414 headroom = i == 0 ? rq->buff.headroom : 0; 359 - addr = page_pool_get_dma_addr(frag->au->page); 415 + addr = page_pool_get_dma_addr(frag->frag_page->page); 360 416 wqe->data[i].addr = cpu_to_be64(addr + frag->offset + headroom); 361 417 } 362 418 ··· 366 418 367 419 free_frags: 368 420 while (--i >= 0) 369 - mlx5e_put_rx_frag(rq, --frag, true); 421 + mlx5e_put_rx_frag(rq, --frag); 370 422 371 423 return err; 372 424 } 373 425 374 426 static inline void mlx5e_free_rx_wqe(struct mlx5e_rq *rq, 375 - struct mlx5e_wqe_frag_info *wi, 376 - bool recycle) 427 + struct mlx5e_wqe_frag_info *wi) 377 428 { 378 429 int i; 379 430 380 - if (rq->xsk_pool) { 381 - /* The `recycle` parameter is ignored, and the page is always 382 - * put into the Reuse Ring, because there is no way to return 383 - * the page to the userspace when the interface goes down. 384 - */ 385 - xsk_buff_free(wi->au->xsk); 386 - return; 387 - } 388 - 389 431 for (i = 0; i < rq->wqe.info.num_frags; i++, wi++) 390 - mlx5e_put_rx_frag(rq, wi, recycle); 432 + mlx5e_put_rx_frag(rq, wi); 433 + } 434 + 435 + static void mlx5e_xsk_free_rx_wqe(struct mlx5e_wqe_frag_info *wi) 436 + { 437 + if (!(wi->flags & BIT(MLX5E_WQE_FRAG_SKIP_RELEASE))) 438 + xsk_buff_free(*wi->xskp); 391 439 } 392 440 393 441 static void mlx5e_dealloc_rx_wqe(struct mlx5e_rq *rq, u16 ix) 394 442 { 395 443 struct mlx5e_wqe_frag_info *wi = get_frag(rq, ix); 396 444 397 - mlx5e_free_rx_wqe(rq, wi, false); 445 + if (rq->xsk_pool) 446 + mlx5e_xsk_free_rx_wqe(wi); 447 + else 448 + mlx5e_free_rx_wqe(rq, wi); 449 + } 450 + 451 + static void mlx5e_xsk_free_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk) 452 + { 453 + struct mlx5_wq_cyc *wq = &rq->wqe.wq; 454 + int i; 455 + 456 + for (i = 0; i < wqe_bulk; i++) { 457 + int j = mlx5_wq_cyc_ctr2ix(wq, ix + i); 458 + struct mlx5e_wqe_frag_info *wi; 459 + 460 + wi = get_frag(rq, j); 461 + /* The page is always put into the Reuse Ring, because there 462 + * is no way to return the page to the userspace when the 463 + * interface goes down. 464 + */ 465 + mlx5e_xsk_free_rx_wqe(wi); 466 + } 467 + } 468 + 469 + static void mlx5e_free_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk) 470 + { 471 + struct mlx5_wq_cyc *wq = &rq->wqe.wq; 472 + int i; 473 + 474 + for (i = 0; i < wqe_bulk; i++) { 475 + int j = mlx5_wq_cyc_ctr2ix(wq, ix + i); 476 + struct mlx5e_wqe_frag_info *wi; 477 + 478 + wi = get_frag(rq, j); 479 + mlx5e_free_rx_wqe(rq, wi); 480 + } 398 481 } 399 482 400 483 static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk) ··· 446 467 return i; 447 468 } 448 469 470 + static int mlx5e_refill_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk) 471 + { 472 + int remaining = wqe_bulk; 473 + int i = 0; 474 + 475 + /* The WQE bulk is split into smaller bulks that are sized 476 + * according to the page pool cache refill size to avoid overflowing 477 + * the page pool cache due to too many page releases at once. 478 + */ 479 + do { 480 + int refill = min_t(u16, rq->wqe.info.refill_unit, remaining); 481 + int alloc_count; 482 + 483 + mlx5e_free_rx_wqes(rq, ix + i, refill); 484 + alloc_count = mlx5e_alloc_rx_wqes(rq, ix + i, refill); 485 + i += alloc_count; 486 + if (unlikely(alloc_count != refill)) 487 + break; 488 + 489 + remaining -= refill; 490 + } while (remaining); 491 + 492 + return i; 493 + } 494 + 449 495 static inline void 450 496 mlx5e_add_skb_frag(struct mlx5e_rq *rq, struct sk_buff *skb, 451 - union mlx5e_alloc_unit *au, u32 frag_offset, u32 len, 497 + struct page *page, u32 frag_offset, u32 len, 452 498 unsigned int truesize) 453 499 { 454 - dma_addr_t addr = page_pool_get_dma_addr(au->page); 500 + dma_addr_t addr = page_pool_get_dma_addr(page); 455 501 456 502 dma_sync_single_for_cpu(rq->pdev, addr + frag_offset, len, 457 503 rq->buff.map_dir); 458 - page_ref_inc(au->page); 459 504 skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, 460 - au->page, frag_offset, len, truesize); 505 + page, frag_offset, len, truesize); 461 506 } 462 507 463 508 static inline void ··· 499 496 } 500 497 501 498 static void 502 - mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi, bool recycle) 499 + mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi) 503 500 { 504 - union mlx5e_alloc_unit *alloc_units = wi->alloc_units; 505 501 bool no_xdp_xmit; 506 502 int i; 507 503 508 504 /* A common case for AF_XDP. */ 509 - if (bitmap_full(wi->xdp_xmit_bitmap, rq->mpwqe.pages_per_wqe)) 505 + if (bitmap_full(wi->skip_release_bitmap, rq->mpwqe.pages_per_wqe)) 510 506 return; 511 507 512 - no_xdp_xmit = bitmap_empty(wi->xdp_xmit_bitmap, rq->mpwqe.pages_per_wqe); 508 + no_xdp_xmit = bitmap_empty(wi->skip_release_bitmap, rq->mpwqe.pages_per_wqe); 513 509 514 510 if (rq->xsk_pool) { 515 - /* The `recycle` parameter is ignored, and the page is always 516 - * put into the Reuse Ring, because there is no way to return 517 - * the page to the userspace when the interface goes down. 511 + struct xdp_buff **xsk_buffs = wi->alloc_units.xsk_buffs; 512 + 513 + /* The page is always put into the Reuse Ring, because there 514 + * is no way to return the page to userspace when the interface 515 + * goes down. 518 516 */ 519 517 for (i = 0; i < rq->mpwqe.pages_per_wqe; i++) 520 - if (no_xdp_xmit || !test_bit(i, wi->xdp_xmit_bitmap)) 521 - xsk_buff_free(alloc_units[i].xsk); 518 + if (no_xdp_xmit || !test_bit(i, wi->skip_release_bitmap)) 519 + xsk_buff_free(xsk_buffs[i]); 522 520 } else { 523 - for (i = 0; i < rq->mpwqe.pages_per_wqe; i++) 524 - if (no_xdp_xmit || !test_bit(i, wi->xdp_xmit_bitmap)) 525 - mlx5e_page_release_dynamic(rq, alloc_units[i].page, recycle); 521 + for (i = 0; i < rq->mpwqe.pages_per_wqe; i++) { 522 + if (no_xdp_xmit || !test_bit(i, wi->skip_release_bitmap)) { 523 + struct mlx5e_frag_page *frag_page; 524 + 525 + frag_page = &wi->alloc_units.frag_pages[i]; 526 + mlx5e_page_release_fragmented(rq, frag_page); 527 + } 528 + } 526 529 } 527 530 } 528 531 ··· 592 583 struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo; 593 584 u16 entries, pi, header_offset, err, wqe_bbs, new_entries; 594 585 u32 lkey = rq->mdev->mlx5e_res.hw_objs.mkey; 595 - struct page *page = shampo->last_page; 586 + u16 page_index = shampo->curr_page_index; 587 + struct mlx5e_frag_page *frag_page; 596 588 u64 addr = shampo->last_addr; 597 589 struct mlx5e_dma_info *dma_info; 598 590 struct mlx5e_umr_wqe *umr_wqe; ··· 607 597 umr_wqe = mlx5_wq_cyc_get_wqe(&sq->wq, pi); 608 598 build_klm_umr(sq, umr_wqe, shampo->key, index, entries, wqe_bbs); 609 599 600 + frag_page = &shampo->pages[page_index]; 601 + 610 602 for (i = 0; i < entries; i++, index++) { 611 603 dma_info = &shampo->info[index]; 612 604 if (i >= klm_entries || (index < shampo->pi && shampo->pi - index < ··· 617 605 header_offset = (index & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) << 618 606 MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE; 619 607 if (!(header_offset & (PAGE_SIZE - 1))) { 620 - union mlx5e_alloc_unit au; 608 + page_index = (page_index + 1) & (shampo->hd_per_wq - 1); 609 + frag_page = &shampo->pages[page_index]; 621 610 622 - err = mlx5e_page_alloc_pool(rq, &au); 611 + err = mlx5e_page_alloc_fragmented(rq, frag_page); 623 612 if (unlikely(err)) 624 613 goto err_unmap; 625 - page = dma_info->page = au.page; 626 - addr = dma_info->addr = page_pool_get_dma_addr(au.page); 614 + 615 + addr = page_pool_get_dma_addr(frag_page->page); 616 + 617 + dma_info->addr = addr; 618 + dma_info->frag_page = frag_page; 627 619 } else { 628 620 dma_info->addr = addr + header_offset; 629 - dma_info->page = page; 621 + dma_info->frag_page = frag_page; 630 622 } 631 623 632 624 update_klm: ··· 648 632 }; 649 633 650 634 shampo->pi = (shampo->pi + new_entries) & (shampo->hd_per_wq - 1); 651 - shampo->last_page = page; 635 + shampo->curr_page_index = page_index; 652 636 shampo->last_addr = addr; 653 637 sq->pc += wqe_bbs; 654 638 sq->doorbell_cseg = &umr_wqe->ctrl; ··· 660 644 dma_info = &shampo->info[--index]; 661 645 if (!(i & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1))) { 662 646 dma_info->addr = ALIGN_DOWN(dma_info->addr, PAGE_SIZE); 663 - mlx5e_page_release_dynamic(rq, dma_info->page, true); 647 + mlx5e_page_release_fragmented(rq, dma_info->frag_page); 664 648 } 665 649 } 666 650 rq->stats->buff_alloc_err++; ··· 709 693 static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix) 710 694 { 711 695 struct mlx5e_mpw_info *wi = mlx5e_get_mpw_info(rq, ix); 712 - union mlx5e_alloc_unit *au = &wi->alloc_units[0]; 713 696 struct mlx5e_icosq *sq = rq->icosq; 697 + struct mlx5e_frag_page *frag_page; 714 698 struct mlx5_wq_cyc *wq = &sq->wq; 715 699 struct mlx5e_umr_wqe *umr_wqe; 716 700 u32 offset; /* 17-bit value with MTT. */ ··· 728 712 umr_wqe = mlx5_wq_cyc_get_wqe(wq, pi); 729 713 memcpy(umr_wqe, &rq->mpwqe.umr_wqe, sizeof(struct mlx5e_umr_wqe)); 730 714 731 - for (i = 0; i < rq->mpwqe.pages_per_wqe; i++, au++) { 715 + frag_page = &wi->alloc_units.frag_pages[0]; 716 + 717 + for (i = 0; i < rq->mpwqe.pages_per_wqe; i++, frag_page++) { 732 718 dma_addr_t addr; 733 719 734 - err = mlx5e_page_alloc_pool(rq, au); 720 + err = mlx5e_page_alloc_fragmented(rq, frag_page); 735 721 if (unlikely(err)) 736 722 goto err_unmap; 737 - addr = page_pool_get_dma_addr(au->page); 723 + addr = page_pool_get_dma_addr(frag_page->page); 738 724 umr_wqe->inline_mtts[i] = (struct mlx5_mtt) { 739 725 .ptag = cpu_to_be64(addr | MLX5_EN_WR), 740 726 }; ··· 753 735 sizeof(*umr_wqe->inline_mtts) * pad); 754 736 } 755 737 756 - bitmap_zero(wi->xdp_xmit_bitmap, rq->mpwqe.pages_per_wqe); 738 + bitmap_zero(wi->skip_release_bitmap, rq->mpwqe.pages_per_wqe); 757 739 wi->consumed_strides = 0; 758 740 759 741 umr_wqe->ctrl.opmod_idx_opcode = ··· 777 759 778 760 err_unmap: 779 761 while (--i >= 0) { 780 - au--; 781 - mlx5e_page_release_dynamic(rq, au->page, true); 762 + frag_page--; 763 + mlx5e_page_release_fragmented(rq, frag_page); 782 764 } 783 765 784 766 err: ··· 796 778 void mlx5e_shampo_dealloc_hd(struct mlx5e_rq *rq, u16 len, u16 start, bool close) 797 779 { 798 780 struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo; 781 + struct mlx5e_frag_page *deleted_page = NULL; 799 782 int hd_per_wq = shampo->hd_per_wq; 800 - struct page *deleted_page = NULL; 801 783 struct mlx5e_dma_info *hd_info; 802 784 int i, index = start; 803 785 ··· 810 792 811 793 hd_info = &shampo->info[index]; 812 794 hd_info->addr = ALIGN_DOWN(hd_info->addr, PAGE_SIZE); 813 - if (hd_info->page != deleted_page) { 814 - deleted_page = hd_info->page; 815 - mlx5e_page_release_dynamic(rq, hd_info->page, false); 795 + if (hd_info->frag_page && hd_info->frag_page != deleted_page) { 796 + deleted_page = hd_info->frag_page; 797 + mlx5e_page_release_fragmented(rq, hd_info->frag_page); 816 798 } 799 + 800 + hd_info->frag_page = NULL; 817 801 } 818 802 819 803 if (start + len > hd_per_wq) { ··· 830 810 static void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix) 831 811 { 832 812 struct mlx5e_mpw_info *wi = mlx5e_get_mpw_info(rq, ix); 833 - /* Don't recycle, this function is called on rq/netdev close */ 834 - mlx5e_free_rx_mpwqe(rq, wi, false); 813 + /* This function is called on rq/netdev close. */ 814 + mlx5e_free_rx_mpwqe(rq, wi); 835 815 } 836 816 837 817 INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq) ··· 858 838 */ 859 839 wqe_bulk -= (head + wqe_bulk) & rq->wqe.info.wqe_index_mask; 860 840 861 - if (!rq->xsk_pool) 862 - count = mlx5e_alloc_rx_wqes(rq, head, wqe_bulk); 863 - else if (likely(!rq->xsk_pool->dma_need_sync)) 841 + if (!rq->xsk_pool) { 842 + count = mlx5e_refill_rx_wqes(rq, head, wqe_bulk); 843 + } else if (likely(!rq->xsk_pool->dma_need_sync)) { 844 + mlx5e_xsk_free_rx_wqes(rq, head, wqe_bulk); 864 845 count = mlx5e_xsk_alloc_rx_wqes_batched(rq, head, wqe_bulk); 865 - else 846 + } else { 847 + mlx5e_xsk_free_rx_wqes(rq, head, wqe_bulk); 866 848 /* If dma_need_sync is true, it's more efficient to call 867 849 * xsk_buff_alloc in a loop, rather than xsk_buff_alloc_batch, 868 850 * because the latter does the same check and returns only one 869 851 * frame. 870 852 */ 871 853 count = mlx5e_xsk_alloc_rx_wqes(rq, head, wqe_bulk); 854 + } 872 855 873 856 mlx5_wq_cyc_push_n(wq, count); 874 857 if (unlikely(count != wqe_bulk)) { ··· 1052 1029 head = rq->mpwqe.actual_wq_head; 1053 1030 i = missing; 1054 1031 do { 1032 + struct mlx5e_mpw_info *wi = mlx5e_get_mpw_info(rq, head); 1033 + 1034 + /* Deferred free for better page pool cache usage. */ 1035 + mlx5e_free_rx_mpwqe(rq, wi); 1036 + 1055 1037 alloc_err = rq->xsk_pool ? mlx5e_xsk_alloc_rx_mpwqe(rq, head) : 1056 1038 mlx5e_alloc_rx_mpwqe(rq, head); 1057 1039 ··· 1161 1133 struct mlx5e_dma_info *last_head = &rq->mpwqe.shampo->info[header_index]; 1162 1134 u16 head_offset = (last_head->addr & (PAGE_SIZE - 1)) + rq->buff.headroom; 1163 1135 1164 - return page_address(last_head->page) + head_offset; 1136 + return page_address(last_head->frag_page->page) + head_offset; 1165 1137 } 1166 1138 1167 1139 static void mlx5e_shampo_update_ipv4_udp_hdr(struct mlx5e_rq *rq, struct iphdr *ipv4) ··· 1614 1586 mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi, 1615 1587 struct mlx5_cqe64 *cqe, u32 cqe_bcnt) 1616 1588 { 1617 - union mlx5e_alloc_unit *au = wi->au; 1589 + struct mlx5e_frag_page *frag_page = wi->frag_page; 1618 1590 u16 rx_headroom = rq->buff.headroom; 1619 1591 struct bpf_prog *prog; 1620 1592 struct sk_buff *skb; ··· 1623 1595 dma_addr_t addr; 1624 1596 u32 frag_size; 1625 1597 1626 - va = page_address(au->page) + wi->offset; 1598 + va = page_address(frag_page->page) + wi->offset; 1627 1599 data = va + rx_headroom; 1628 1600 frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt); 1629 1601 1630 - addr = page_pool_get_dma_addr(au->page); 1602 + addr = page_pool_get_dma_addr(frag_page->page); 1631 1603 dma_sync_single_range_for_cpu(rq->pdev, addr, wi->offset, 1632 1604 frag_size, rq->buff.map_dir); 1633 1605 net_prefetch(data); ··· 1651 1623 return NULL; 1652 1624 1653 1625 /* queue up for recycling/reuse */ 1654 - page_ref_inc(au->page); 1626 + skb_mark_for_recycle(skb); 1627 + frag_page->frags++; 1655 1628 1656 1629 return skb; 1657 1630 } ··· 1663 1634 { 1664 1635 struct mlx5e_rq_frag_info *frag_info = &rq->wqe.info.arr[0]; 1665 1636 struct mlx5e_wqe_frag_info *head_wi = wi; 1666 - union mlx5e_alloc_unit *au = wi->au; 1667 1637 u16 rx_headroom = rq->buff.headroom; 1638 + struct mlx5e_frag_page *frag_page; 1668 1639 struct skb_shared_info *sinfo; 1669 1640 struct mlx5e_xdp_buff mxbuf; 1670 1641 u32 frag_consumed_bytes; ··· 1674 1645 u32 truesize; 1675 1646 void *va; 1676 1647 1677 - va = page_address(au->page) + wi->offset; 1648 + frag_page = wi->frag_page; 1649 + 1650 + va = page_address(frag_page->page) + wi->offset; 1678 1651 frag_consumed_bytes = min_t(u32, frag_info->frag_size, cqe_bcnt); 1679 1652 1680 - addr = page_pool_get_dma_addr(au->page); 1653 + addr = page_pool_get_dma_addr(frag_page->page); 1681 1654 dma_sync_single_range_for_cpu(rq->pdev, addr, wi->offset, 1682 1655 rq->buff.frame0_sz, rq->buff.map_dir); 1683 1656 net_prefetchw(va); /* xdp_frame data area */ ··· 1696 1665 while (cqe_bcnt) { 1697 1666 skb_frag_t *frag; 1698 1667 1699 - au = wi->au; 1668 + frag_page = wi->frag_page; 1700 1669 1701 1670 frag_consumed_bytes = min_t(u32, frag_info->frag_size, cqe_bcnt); 1702 1671 1703 - addr = page_pool_get_dma_addr(au->page); 1672 + addr = page_pool_get_dma_addr(frag_page->page); 1704 1673 dma_sync_single_for_cpu(rq->pdev, addr + wi->offset, 1705 1674 frag_consumed_bytes, rq->buff.map_dir); 1706 1675 ··· 1714 1683 } 1715 1684 1716 1685 frag = &sinfo->frags[sinfo->nr_frags++]; 1717 - __skb_frag_set_page(frag, au->page); 1686 + 1687 + __skb_frag_set_page(frag, frag_page->page); 1718 1688 skb_frag_off_set(frag, wi->offset); 1719 1689 skb_frag_size_set(frag, frag_consumed_bytes); 1720 1690 1721 - if (page_is_pfmemalloc(au->page)) 1691 + if (page_is_pfmemalloc(frag_page->page)) 1722 1692 xdp_buff_set_frag_pfmemalloc(&mxbuf.xdp); 1723 1693 1724 1694 sinfo->xdp_frags_size += frag_consumed_bytes; ··· 1736 1704 int i; 1737 1705 1738 1706 for (i = wi - head_wi; i < rq->wqe.info.num_frags; i++) 1739 - mlx5e_put_rx_frag(rq, &head_wi[i], true); 1707 + mlx5e_put_rx_frag(rq, &head_wi[i]); 1740 1708 } 1741 1709 return NULL; /* page/packet was consumed by XDP */ 1742 1710 } ··· 1748 1716 if (unlikely(!skb)) 1749 1717 return NULL; 1750 1718 1751 - page_ref_inc(head_wi->au->page); 1719 + skb_mark_for_recycle(skb); 1720 + head_wi->frag_page->frags++; 1752 1721 1753 1722 if (xdp_buff_has_frags(&mxbuf.xdp)) { 1754 - int i; 1755 - 1756 1723 /* sinfo->nr_frags is reset by build_skb, calculate again. */ 1757 1724 xdp_update_skb_shared_info(skb, wi - head_wi - 1, 1758 1725 sinfo->xdp_frags_size, truesize, 1759 1726 xdp_buff_is_frag_pfmemalloc(&mxbuf.xdp)); 1760 1727 1761 - for (i = 0; i < sinfo->nr_frags; i++) { 1762 - skb_frag_t *frag = &sinfo->frags[i]; 1763 - 1764 - page_ref_inc(skb_frag_page(frag)); 1765 - } 1728 + for (struct mlx5e_wqe_frag_info *pwi = head_wi + 1; pwi < wi; pwi++) 1729 + pwi->frag_page->frags++; 1766 1730 } 1767 1731 1768 1732 return skb; ··· 1796 1768 1797 1769 if (unlikely(MLX5E_RX_ERR_CQE(cqe))) { 1798 1770 mlx5e_handle_rx_err_cqe(rq, cqe); 1799 - goto free_wqe; 1771 + goto wq_cyc_pop; 1800 1772 } 1801 1773 1802 1774 skb = INDIRECT_CALL_3(rq->wqe.skb_from_cqe, ··· 1810 1782 /* do not return page to cache, 1811 1783 * it will be returned on XDP_TX completion. 1812 1784 */ 1813 - goto wq_cyc_pop; 1785 + wi->flags |= BIT(MLX5E_WQE_FRAG_SKIP_RELEASE); 1814 1786 } 1815 - goto free_wqe; 1787 + goto wq_cyc_pop; 1816 1788 } 1817 1789 1818 1790 mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb); ··· 1820 1792 if (mlx5e_cqe_regb_chain(cqe)) 1821 1793 if (!mlx5e_tc_update_skb_nic(cqe, skb)) { 1822 1794 dev_kfree_skb_any(skb); 1823 - goto free_wqe; 1795 + goto wq_cyc_pop; 1824 1796 } 1825 1797 1826 1798 napi_gro_receive(rq->cq.napi, skb); 1827 1799 1828 - free_wqe: 1829 - mlx5e_free_rx_wqe(rq, wi, true); 1830 1800 wq_cyc_pop: 1831 1801 mlx5_wq_cyc_pop(wq); 1832 1802 } ··· 1848 1822 1849 1823 if (unlikely(MLX5E_RX_ERR_CQE(cqe))) { 1850 1824 mlx5e_handle_rx_err_cqe(rq, cqe); 1851 - goto free_wqe; 1825 + goto wq_cyc_pop; 1852 1826 } 1853 1827 1854 1828 skb = INDIRECT_CALL_2(rq->wqe.skb_from_cqe, ··· 1861 1835 /* do not return page to cache, 1862 1836 * it will be returned on XDP_TX completion. 1863 1837 */ 1864 - goto wq_cyc_pop; 1838 + wi->flags |= BIT(MLX5E_WQE_FRAG_SKIP_RELEASE); 1865 1839 } 1866 - goto free_wqe; 1840 + goto wq_cyc_pop; 1867 1841 } 1868 1842 1869 1843 mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb); ··· 1873 1847 1874 1848 mlx5e_rep_tc_receive(cqe, rq, skb); 1875 1849 1876 - free_wqe: 1877 - mlx5e_free_rx_wqe(rq, wi, true); 1878 1850 wq_cyc_pop: 1879 1851 mlx5_wq_cyc_pop(wq); 1880 1852 } ··· 1925 1901 1926 1902 wq = &rq->mpwqe.wq; 1927 1903 wqe = mlx5_wq_ll_get_wqe(wq, wqe_id); 1928 - mlx5e_free_rx_mpwqe(rq, wi, true); 1929 1904 mlx5_wq_ll_pop(wq, cqe->wqe_id, &wqe->next.next_wqe_index); 1930 1905 } 1931 1906 ··· 1936 1913 1937 1914 static void 1938 1915 mlx5e_fill_skb_data(struct sk_buff *skb, struct mlx5e_rq *rq, 1939 - union mlx5e_alloc_unit *au, u32 data_bcnt, u32 data_offset) 1916 + struct mlx5e_frag_page *frag_page, 1917 + u32 data_bcnt, u32 data_offset) 1940 1918 { 1941 1919 net_prefetchw(skb->data); 1942 1920 ··· 1951 1927 else 1952 1928 truesize = ALIGN(pg_consumed_bytes, BIT(rq->mpwqe.log_stride_sz)); 1953 1929 1954 - mlx5e_add_skb_frag(rq, skb, au, data_offset, 1930 + frag_page->frags++; 1931 + mlx5e_add_skb_frag(rq, skb, frag_page->page, data_offset, 1955 1932 pg_consumed_bytes, truesize); 1956 1933 1957 1934 data_bcnt -= pg_consumed_bytes; 1958 1935 data_offset = 0; 1959 - au++; 1936 + frag_page++; 1960 1937 } 1961 1938 } 1962 1939 ··· 1966 1941 struct mlx5_cqe64 *cqe, u16 cqe_bcnt, u32 head_offset, 1967 1942 u32 page_idx) 1968 1943 { 1969 - union mlx5e_alloc_unit *au = &wi->alloc_units[page_idx]; 1944 + struct mlx5e_frag_page *frag_page = &wi->alloc_units.frag_pages[page_idx]; 1970 1945 u16 headlen = min_t(u16, MLX5E_RX_MAX_HEAD, cqe_bcnt); 1946 + struct mlx5e_frag_page *head_page = frag_page; 1971 1947 u32 frag_offset = head_offset + headlen; 1972 1948 u32 byte_cnt = cqe_bcnt - headlen; 1973 - union mlx5e_alloc_unit *head_au = au; 1974 1949 struct sk_buff *skb; 1975 1950 dma_addr_t addr; 1976 1951 ··· 1985 1960 1986 1961 /* Non-linear mode, hence non-XSK, which always uses PAGE_SIZE. */ 1987 1962 if (unlikely(frag_offset >= PAGE_SIZE)) { 1988 - au++; 1963 + frag_page++; 1989 1964 frag_offset -= PAGE_SIZE; 1990 1965 } 1991 1966 1992 - mlx5e_fill_skb_data(skb, rq, au, byte_cnt, frag_offset); 1967 + skb_mark_for_recycle(skb); 1968 + mlx5e_fill_skb_data(skb, rq, frag_page, byte_cnt, frag_offset); 1993 1969 /* copy header */ 1994 - addr = page_pool_get_dma_addr(head_au->page); 1995 - mlx5e_copy_skb_header(rq, skb, head_au->page, addr, 1970 + addr = page_pool_get_dma_addr(head_page->page); 1971 + mlx5e_copy_skb_header(rq, skb, head_page->page, addr, 1996 1972 head_offset, head_offset, headlen); 1997 1973 /* skb linear part was allocated with headlen and aligned to long */ 1998 1974 skb->tail += headlen; ··· 2007 1981 struct mlx5_cqe64 *cqe, u16 cqe_bcnt, u32 head_offset, 2008 1982 u32 page_idx) 2009 1983 { 2010 - union mlx5e_alloc_unit *au = &wi->alloc_units[page_idx]; 1984 + struct mlx5e_frag_page *frag_page = &wi->alloc_units.frag_pages[page_idx]; 2011 1985 u16 rx_headroom = rq->buff.headroom; 2012 1986 struct bpf_prog *prog; 2013 1987 struct sk_buff *skb; ··· 2022 1996 return NULL; 2023 1997 } 2024 1998 2025 - va = page_address(au->page) + head_offset; 1999 + va = page_address(frag_page->page) + head_offset; 2026 2000 data = va + rx_headroom; 2027 2001 frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt); 2028 2002 2029 - addr = page_pool_get_dma_addr(au->page); 2003 + addr = page_pool_get_dma_addr(frag_page->page); 2030 2004 dma_sync_single_range_for_cpu(rq->pdev, addr, head_offset, 2031 2005 frag_size, rq->buff.map_dir); 2032 2006 net_prefetch(data); ··· 2039 2013 mlx5e_fill_mxbuf(rq, cqe, va, rx_headroom, cqe_bcnt, &mxbuf); 2040 2014 if (mlx5e_xdp_handle(rq, prog, &mxbuf)) { 2041 2015 if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) 2042 - __set_bit(page_idx, wi->xdp_xmit_bitmap); /* non-atomic */ 2016 + __set_bit(page_idx, wi->skip_release_bitmap); /* non-atomic */ 2043 2017 return NULL; /* page/packet was consumed by XDP */ 2044 2018 } 2045 2019 ··· 2053 2027 return NULL; 2054 2028 2055 2029 /* queue up for recycling/reuse */ 2056 - page_ref_inc(au->page); 2030 + skb_mark_for_recycle(skb); 2031 + frag_page->frags++; 2057 2032 2058 2033 return skb; 2059 2034 } ··· 2071 2044 void *hdr, *data; 2072 2045 u32 frag_size; 2073 2046 2074 - hdr = page_address(head->page) + head_offset; 2047 + hdr = page_address(head->frag_page->page) + head_offset; 2075 2048 data = hdr + rx_headroom; 2076 2049 frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + head_size); 2077 2050 ··· 2085 2058 if (unlikely(!skb)) 2086 2059 return NULL; 2087 2060 2088 - /* queue up for recycling/reuse */ 2089 - page_ref_inc(head->page); 2090 - 2061 + head->frag_page->frags++; 2091 2062 } else { 2092 2063 /* allocate SKB and copy header for large header */ 2093 2064 rq->stats->gro_large_hds++; ··· 2097 2072 } 2098 2073 2099 2074 prefetchw(skb->data); 2100 - mlx5e_copy_skb_header(rq, skb, head->page, head->addr, 2075 + mlx5e_copy_skb_header(rq, skb, head->frag_page->page, head->addr, 2101 2076 head_offset + rx_headroom, 2102 2077 rx_headroom, head_size); 2103 2078 /* skb linear part was allocated with headlen and aligned to long */ 2104 2079 skb->tail += head_size; 2105 2080 skb->len += head_size; 2106 2081 } 2082 + 2083 + /* queue up for recycling/reuse */ 2084 + skb_mark_for_recycle(skb); 2085 + 2107 2086 return skb; 2108 2087 } 2109 2088 ··· 2152 2123 u64 addr = shampo->info[header_index].addr; 2153 2124 2154 2125 if (((header_index + 1) & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) == 0) { 2155 - shampo->info[header_index].addr = ALIGN_DOWN(addr, PAGE_SIZE); 2156 - mlx5e_page_release_dynamic(rq, shampo->info[header_index].page, true); 2126 + struct mlx5e_dma_info *dma_info = &shampo->info[header_index]; 2127 + 2128 + dma_info->addr = ALIGN_DOWN(addr, PAGE_SIZE); 2129 + mlx5e_page_release_fragmented(rq, dma_info->frag_page); 2157 2130 } 2158 2131 bitmap_clear(shampo->bitmap, header_index, 1); 2159 2132 } ··· 2176 2145 bool match = cqe->shampo.match; 2177 2146 struct mlx5e_rq_stats *stats = rq->stats; 2178 2147 struct mlx5e_rx_wqe_ll *wqe; 2179 - union mlx5e_alloc_unit *au; 2180 2148 struct mlx5e_mpw_info *wi; 2181 2149 struct mlx5_wq_ll *wq; 2182 2150 ··· 2225 2195 } 2226 2196 2227 2197 if (likely(head_size)) { 2228 - au = &wi->alloc_units[page_idx]; 2229 - mlx5e_fill_skb_data(*skb, rq, au, data_bcnt, data_offset); 2198 + struct mlx5e_frag_page *frag_page; 2199 + 2200 + frag_page = &wi->alloc_units.frag_pages[page_idx]; 2201 + mlx5e_fill_skb_data(*skb, rq, frag_page, data_bcnt, data_offset); 2230 2202 } 2231 2203 2232 2204 mlx5e_shampo_complete_rx_cqe(rq, cqe, cqe_bcnt, *skb); ··· 2242 2210 2243 2211 wq = &rq->mpwqe.wq; 2244 2212 wqe = mlx5_wq_ll_get_wqe(wq, wqe_id); 2245 - mlx5e_free_rx_mpwqe(rq, wi, true); 2246 2213 mlx5_wq_ll_pop(wq, cqe->wqe_id, &wqe->next.next_wqe_index); 2247 2214 } 2248 2215 ··· 2301 2270 2302 2271 wq = &rq->mpwqe.wq; 2303 2272 wqe = mlx5_wq_ll_get_wqe(wq, wqe_id); 2304 - mlx5e_free_rx_mpwqe(rq, wi, true); 2305 2273 mlx5_wq_ll_pop(wq, cqe->wqe_id, &wqe->next.next_wqe_index); 2306 2274 } 2307 2275 ··· 2519 2489 2520 2490 if (unlikely(MLX5E_RX_ERR_CQE(cqe))) { 2521 2491 rq->stats->wqe_err++; 2522 - goto wq_free_wqe; 2492 + goto wq_cyc_pop; 2523 2493 } 2524 2494 2525 2495 skb = INDIRECT_CALL_2(rq->wqe.skb_from_cqe, ··· 2527 2497 mlx5e_skb_from_cqe_nonlinear, 2528 2498 rq, wi, cqe, cqe_bcnt); 2529 2499 if (!skb) 2530 - goto wq_free_wqe; 2500 + goto wq_cyc_pop; 2531 2501 2532 2502 mlx5i_complete_rx_cqe(rq, cqe, cqe_bcnt, skb); 2533 2503 if (unlikely(!skb->dev)) { 2534 2504 dev_kfree_skb_any(skb); 2535 - goto wq_free_wqe; 2505 + goto wq_cyc_pop; 2536 2506 } 2537 2507 napi_gro_receive(rq->cq.napi, skb); 2538 2508 2539 - wq_free_wqe: 2540 - mlx5e_free_rx_wqe(rq, wi, true); 2509 + wq_cyc_pop: 2541 2510 mlx5_wq_cyc_pop(wq); 2542 2511 } 2543 2512 ··· 2611 2582 2612 2583 if (unlikely(MLX5E_RX_ERR_CQE(cqe))) { 2613 2584 rq->stats->wqe_err++; 2614 - goto free_wqe; 2585 + goto wq_cyc_pop; 2615 2586 } 2616 2587 2617 2588 skb = mlx5e_skb_from_cqe_nonlinear(rq, wi, cqe, cqe_bcnt); 2618 2589 if (!skb) 2619 - goto free_wqe; 2590 + goto wq_cyc_pop; 2620 2591 2621 2592 mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb); 2622 2593 skb_push(skb, ETH_HLEN); ··· 2625 2596 rq->netdev->devlink_port); 2626 2597 dev_kfree_skb_any(skb); 2627 2598 2628 - free_wqe: 2629 - mlx5e_free_rx_wqe(rq, wi, false); 2599 + wq_cyc_pop: 2630 2600 mlx5_wq_cyc_pop(wq); 2631 2601 } 2632 2602
-20
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
··· 179 179 { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_buff_alloc_err) }, 180 180 { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cqe_compress_blks) }, 181 181 { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cqe_compress_pkts) }, 182 - { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_reuse) }, 183 - { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_full) }, 184 - { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_empty) }, 185 - { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_busy) }, 186 - { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_waive) }, 187 182 { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_congst_umr) }, 188 183 { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_arfs_err) }, 189 184 { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_recover) }, ··· 353 358 s->rx_buff_alloc_err += rq_stats->buff_alloc_err; 354 359 s->rx_cqe_compress_blks += rq_stats->cqe_compress_blks; 355 360 s->rx_cqe_compress_pkts += rq_stats->cqe_compress_pkts; 356 - s->rx_cache_reuse += rq_stats->cache_reuse; 357 - s->rx_cache_full += rq_stats->cache_full; 358 - s->rx_cache_empty += rq_stats->cache_empty; 359 - s->rx_cache_busy += rq_stats->cache_busy; 360 - s->rx_cache_waive += rq_stats->cache_waive; 361 361 s->rx_congst_umr += rq_stats->congst_umr; 362 362 s->rx_arfs_err += rq_stats->arfs_err; 363 363 s->rx_recover += rq_stats->recover; ··· 1968 1978 { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, buff_alloc_err) }, 1969 1979 { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cqe_compress_blks) }, 1970 1980 { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cqe_compress_pkts) }, 1971 - { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_reuse) }, 1972 - { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_full) }, 1973 - { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_empty) }, 1974 - { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_busy) }, 1975 - { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_waive) }, 1976 1981 { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, congst_umr) }, 1977 1982 { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, arfs_err) }, 1978 1983 { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, recover) }, ··· 2148 2163 { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, buff_alloc_err) }, 2149 2164 { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, cqe_compress_blks) }, 2150 2165 { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, cqe_compress_pkts) }, 2151 - { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, cache_reuse) }, 2152 - { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, cache_full) }, 2153 - { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, cache_empty) }, 2154 - { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, cache_busy) }, 2155 - { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, cache_waive) }, 2156 2166 { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, congst_umr) }, 2157 2167 { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, arfs_err) }, 2158 2168 { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, recover) },
-10
drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
··· 193 193 u64 rx_buff_alloc_err; 194 194 u64 rx_cqe_compress_blks; 195 195 u64 rx_cqe_compress_pkts; 196 - u64 rx_cache_reuse; 197 - u64 rx_cache_full; 198 - u64 rx_cache_empty; 199 - u64 rx_cache_busy; 200 - u64 rx_cache_waive; 201 196 u64 rx_congst_umr; 202 197 u64 rx_arfs_err; 203 198 u64 rx_recover; ··· 357 362 u64 buff_alloc_err; 358 363 u64 cqe_compress_blks; 359 364 u64 cqe_compress_pkts; 360 - u64 cache_reuse; 361 - u64 cache_full; 362 - u64 cache_empty; 363 - u64 cache_busy; 364 - u64 cache_waive; 365 365 u64 congst_umr; 366 366 u64 arfs_err; 367 367 u64 recover;